Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001591 [Issue 8 drafts] System Interfaces Objection Error 2022-06-29 18:39 2022-07-25 14:00
Reporter eggert View Status public  
Assigned To
Priority normal Resolution Open  
Status New   Product Version Draft 2.1
Name Paul Eggert
Organization UCLA
User Reference
Section strlcat, wcslcat
Page Number 2005, 2216
Line Number 65136, 71365
Final Accepted Text
Summary 0001591: Proposed strlcpy spec has problems
Description In <https://sourceware.org/pipermail/libc-alpha/2022-June/140093.html> [^] Florian Weimer brought to my attention Austin Group Issue 986 <https://www.austingroupbugs.net/view.php?id=986>. [^] Although this issue is labeled as "Editorial" and "Clarification Requested", its resolution took a proposed approach that went well beyond editorial or clarification. The proposed approach overlooks some important technical issues that need to be considered before attempting to impose a standard in the controversial area of standardizing strlcpy and related functions.

Here are some problems with the proposed approach:

* Although strlcpy may have been a reasonable hack when introduced in the 1990s, technology has passed it by and we now have less-intrusive and more-reliable approaches such as FORTIFY_SOURCE and AddressSanitizer.

* With commonly-available current technologies, a program that uses strlcpy is typically *less* secure than one that uses strcpy, because when a fixed-size buffer would be overrun strlcpy silently corrupts the output data via truncation, whereas strcpy catches and reports the program error.

* Even though the call strlcpy(DST, SRC, 2 * sizeof DST) is clearly wrong and dangerous, the proposed spec says the call happens to have well-defined behavior if strlen(SRC) < sizeof DST, and requires a conforming implementation to implement it as if via strlen(strcpy(DST, SRC)). Programmers do not need this requirement, and the unnecessary requirement stands in the way of checking strlcpy calls for common programming errors involving miscalculated sizes.

* strlcpy introduces the possibility of denial-of-service attacks. The time cost of strlcpy(DST, SRC, sizeof DST) is O(strlen(SRC)), which makes it vulnerable to a denial-of-service attack when the attacker controls SRC and can specify a long SRC string. In this respect, strlcpy is worse than then already-standard (strncpy(DST, SRC, sizeof DST - 1), DST[sizeof DST - 1] = '\0') which despite its other flaws at least has the virtue of costing only O(sizeof DST), a cost that is under the defender's control and is typically small.

* There is no demonstrated need for the wcslcpy and wcslcat functions. For example, there is not a single call to either function in the FreeBSD source code. Requiring these functions of every implementation would be makework and a net minus for the community.

* There is no good evidence that modifying code to use strlcpy improves security more than using already-standardized approaches, even if technologies like FORTIFY_SOURCE and AddressSanitizer are not available.
Desired Action The simplest way to address these issues is to revert the proposal and not add strlcpy etc. to the standard.

Failing that, at least the following actions should be taken:

* To address the denial-of-service problem, drop the requirement that strlcpy must compute the string length of an overlong source. If SRC is too long, allow strlcpy(DST, SRC, sizeof DST) to return any value greater than or equal to sizeof DST. This will support the vast majority of uses of strlcpy, which do not need or use the string length.

* Allow strlcpy(DST, SRC, sizeof DST) to trap if SRC is too long. This will allow strlcpy implementations to catch and report programming errors that would otherwise result in silent data corruption via truncation. I just now surveyed FreeBSD's bin directory sources, and found that most of its uses of strlcpy fall into this category. (I am not proposing that FreeBSD change its strlcpy implementation; I'm merely noting that in practice, most strlcpy uses would benefit from allowing strlcpy implementations to trap.)

* Allow the implementation of strlcpy(DST, SRC, sizeof DST) to modify all the bytes of DST, rather than only those bytes up through strlen(SRC)+1. This would help debugging implementations catch programming errors like strlcpy(DST, SRC, 2 * sizeof DST). For example, assuming DST has a fixed nonzero size, it should be OK to implement strlcpy(DST, SRC, sizeof DST) via (DST[sizeof DST - 1] = '\0', strlen(strncpy(DST, SRC, sizeof DST - 1)).

* Remove the wcslcpy and wcslcat functions from the proposal. The cost of standardizing and supporting these unused functions outweighs any benefit.

* Whatever other action is taken, modify the rationale to state clearly the engineering problems that strlcpy is designed to address, and the pros and cons of using strlcpy versus using other popular approaches. This is a problematic area, and readers of the standard cannot be expected to be familiar with the ins and outs.

None of these suggested actions would invalidate existing strlcpy implementations, which would not need to be changed.
Tags No tags attached.
Attached Files

- Relationships
related to 0000986Appliedajosey 1003.1(2008)/Issue 7 Would it be worth investigating adding strlcpy(), strlcat(), wcslcpy() and wcslcat()? 

-  Notes
(0005872)
wahern (reporter)
2022-06-30 01:12

> * To address the denial-of-service problem, drop the requirement that strlcpy must compute the string length of an overlong source. If SRC is too long, allow strlcpy(DST, SRC, sizeof DST) to return any value greater than or equal to sizeof DST. This will support the vast majority of uses of strlcpy, which do not need or use the string length.

The behavior of returning the logical length is the same or similar to several other POSIX functions, including snprintf and regerror. In fact, OpenBSD's regerror implementation uses strlcpy to provide the POSIX-required regerror return value semantics.

It's not uncommon to use the returned logical length to resize a buffer appropriately--e.g. to call regerror once, and upon truncation realloc a buffer and call it again.[1] So simply surveying the immediate sites where strlcpy is used is not going to capture real-world dependencies. Changing this semantic sabotages the purpose of standardizing strlcpy.

Most of your objections seem like general gripes with C strings in general. They may be legitimate enough, but strike me as irrelevant. Ditto wrt AddressSanitizer and FORTIFY_SOURCE, which aren't even standardized and are largely platform dependent.

[1] If subsequent calls are not idempotent, e.g. because the locale has been changed, then hopefully the caller is using a loop, but in any event idempotency is often expected, rightly or wrongly. There are potential pitfalls like this all over the place. (POSIX says changing locale between regcomp and regexec is undefined, but is silent wrt regerror.) Objections to strlcpy have always struck me as making perfect the enemy of the good. All of this hand-wringing would be better spent on the C committee advocating for proposals that improve and extend variably modified array types. Some of these proposals come close to effectively specifying dependent typing semantics between pointer and size parameters and even struct members. This would be much better--easier, more consistent, more reliable--than the __builtin_object_size hacks FORTIFY_SOURCE rely upon.
(0005873)
eggert (reporter)
2022-06-30 04:49

> It's not uncommon to use the returned logical length to resize a buffer appropriately

That's not the case for FreeBSD. It has thousands of calls to strlcpy. Most don't inspect the return value, and would suffer from silent truncation if the source string were too long. The relatively few places that inspect the return value almost invariably error out if the source string is too long. It is exceedingly rare for calling code to behave as you describe.

Again, I'm not proposing that FreeBSD change its strlcpy implementation, only that the standard should not prohibit more-useful implementations, assuming it standardizes strlcpy at all.

> Most of your objections seem like general gripes with C strings in general.

No, they're specifically aimed at strlcpy and its three relatives.
(0005874)
geoffclare (manager)
2022-06-30 08:38

Updated to give the page and line numbers from draft 2.1
(0005875)
geoffclare (manager)
2022-06-30 08:53

There seems to be some confusion about the status of these interfaces. Not only are they in draft 2.1 but the source for that addition is a published (non-draft) standard of The Open Group https://publications.opengroup.org/c211 [^]
(0005876)
carlos (reporter)
2022-06-30 13:15

The current draft text captures the essence of the existing implementations.

Changing the text as suggested could produce observably different implementations that are not compatible.

This would cause portability traps for software ported from BSD to an implementation that makes use of the additional permissions that are being suggested.
(0005907)
nrk (reporter)
2022-07-25 14:00

> That's not the case for FreeBSD. It has thousands of calls to strlcpy. Most
> don't inspect the return value

My experience has been similar. *MAJORITY* of the strlcpy calls I see don't
inspect the return value.

This is because most people don't really want strlcpy, what they *actually*
want is a string copy function which will truncate if needed (assuming
truncation is desired).

strlcpy is simply used as a (horrible) means to this end. What people *really*
want is something more closer to this:

    strcpy_trunc(buf, src, sizeof buf);

This function would act similar to the following, no need to suffer from O(n)
strlen(src) call:

    if (memccpy(buf, src, '\0', sizeof buf) == NULL)
        buf[sizeof buf - 1] = '\0';

And perhaps return the pointer to '\0' inside buf (similar to stpcpy) as a
bonus which can be used for quickly calculating the strlen or for efficient
string concatenation.

    char *strcpy_trunc(char *restrict dest, const char *restrict src, size_t n);

Such a function would give people what they *actually* want; which is to copy a
string into a fixed-buffer, truncating if necessary; while being efficient as
well.

I should also note that strlcpy isn't even that widely adopted. Neither windows
nor glibc provides it. IMO it's not a good idea standardizing an inefficient,
not widely implemented and abused function.

If this gets standardized, more people are going to abusing it as it and when a
better function (such as the one shown above) does later get standardized,
there's already going to be too many code-base which will rely on this
non-optimal one making it hard to remove.

- Issue History
Date Modified Username Field Change
2022-06-29 18:39 eggert New Issue
2022-06-29 18:39 eggert Name => Paul Eggert
2022-06-29 18:39 eggert Organization => UCLA
2022-06-29 18:39 eggert Section => no section number yet (no draft yet)
2022-06-29 18:39 eggert Page Number => no page number yet (no draft yet)
2022-06-29 18:39 eggert Line Number => no line number yet (no draft yet)
2022-06-30 01:12 wahern Note Added: 0005872
2022-06-30 04:49 eggert Note Added: 0005873
2022-06-30 08:38 geoffclare Section no section number yet (no draft yet) => strlcat, wcslcat
2022-06-30 08:38 geoffclare Page Number no page number yet (no draft yet) => 2005, 2216
2022-06-30 08:38 geoffclare Line Number no line number yet (no draft yet) => 65136, 71365
2022-06-30 08:38 geoffclare Note Added: 0005874
2022-06-30 08:38 geoffclare version => Draft 2.1
2022-06-30 08:39 geoffclare Relationship added related to 0000986
2022-06-30 08:53 geoffclare Note Added: 0005875
2022-06-30 13:15 carlos Note Added: 0005876
2022-07-11 15:49 nrk Issue Monitored: nrk
2022-07-25 14:00 nrk Note Added: 0005907
2022-08-15 13:35 Florian Weimer Issue Monitored: Florian Weimer


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker