|Anonymous | Login||2022-09-25 08:16 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001591||[Issue 8 drafts] System Interfaces||Objection||Error||2022-06-29 18:39||2022-07-25 14:00|
|Status||New||Product Version||Draft 2.1|
|Page Number||2005, 2216|
|Line Number||65136, 71365|
|Final Accepted Text|
|Summary||0001591: Proposed strlcpy spec has problems|
In <https://sourceware.org/pipermail/libc-alpha/2022-June/140093.html> [^] Florian Weimer brought to my attention Austin Group Issue 986 <https://www.austingroupbugs.net/view.php?id=986>. [^] Although this issue is labeled as "Editorial" and "Clarification Requested", its resolution took a proposed approach that went well beyond editorial or clarification. The proposed approach overlooks some important technical issues that need to be considered before attempting to impose a standard in the controversial area of standardizing strlcpy and related functions.
Here are some problems with the proposed approach:
* Although strlcpy may have been a reasonable hack when introduced in the 1990s, technology has passed it by and we now have less-intrusive and more-reliable approaches such as FORTIFY_SOURCE and AddressSanitizer.
* With commonly-available current technologies, a program that uses strlcpy is typically *less* secure than one that uses strcpy, because when a fixed-size buffer would be overrun strlcpy silently corrupts the output data via truncation, whereas strcpy catches and reports the program error.
* Even though the call strlcpy(DST, SRC, 2 * sizeof DST) is clearly wrong and dangerous, the proposed spec says the call happens to have well-defined behavior if strlen(SRC) < sizeof DST, and requires a conforming implementation to implement it as if via strlen(strcpy(DST, SRC)). Programmers do not need this requirement, and the unnecessary requirement stands in the way of checking strlcpy calls for common programming errors involving miscalculated sizes.
* strlcpy introduces the possibility of denial-of-service attacks. The time cost of strlcpy(DST, SRC, sizeof DST) is O(strlen(SRC)), which makes it vulnerable to a denial-of-service attack when the attacker controls SRC and can specify a long SRC string. In this respect, strlcpy is worse than then already-standard (strncpy(DST, SRC, sizeof DST - 1), DST[sizeof DST - 1] = '\0') which despite its other flaws at least has the virtue of costing only O(sizeof DST), a cost that is under the defender's control and is typically small.
* There is no demonstrated need for the wcslcpy and wcslcat functions. For example, there is not a single call to either function in the FreeBSD source code. Requiring these functions of every implementation would be makework and a net minus for the community.
* There is no good evidence that modifying code to use strlcpy improves security more than using already-standardized approaches, even if technologies like FORTIFY_SOURCE and AddressSanitizer are not available.
The simplest way to address these issues is to revert the proposal and not add strlcpy etc. to the standard.
Failing that, at least the following actions should be taken:
* To address the denial-of-service problem, drop the requirement that strlcpy must compute the string length of an overlong source. If SRC is too long, allow strlcpy(DST, SRC, sizeof DST) to return any value greater than or equal to sizeof DST. This will support the vast majority of uses of strlcpy, which do not need or use the string length.
* Allow strlcpy(DST, SRC, sizeof DST) to trap if SRC is too long. This will allow strlcpy implementations to catch and report programming errors that would otherwise result in silent data corruption via truncation. I just now surveyed FreeBSD's bin directory sources, and found that most of its uses of strlcpy fall into this category. (I am not proposing that FreeBSD change its strlcpy implementation; I'm merely noting that in practice, most strlcpy uses would benefit from allowing strlcpy implementations to trap.)
* Allow the implementation of strlcpy(DST, SRC, sizeof DST) to modify all the bytes of DST, rather than only those bytes up through strlen(SRC)+1. This would help debugging implementations catch programming errors like strlcpy(DST, SRC, 2 * sizeof DST). For example, assuming DST has a fixed nonzero size, it should be OK to implement strlcpy(DST, SRC, sizeof DST) via (DST[sizeof DST - 1] = '\0', strlen(strncpy(DST, SRC, sizeof DST - 1)).
* Remove the wcslcpy and wcslcat functions from the proposal. The cost of standardizing and supporting these unused functions outweighs any benefit.
* Whatever other action is taken, modify the rationale to state clearly the engineering problems that strlcpy is designed to address, and the pros and cons of using strlcpy versus using other popular approaches. This is a problematic area, and readers of the standard cannot be expected to be familiar with the ins and outs.
None of these suggested actions would invalidate existing strlcpy implementations, which would not need to be changed.
|Tags||No tags attached.|
> * To address the denial-of-service problem, drop the requirement that strlcpy must compute the string length of an overlong source. If SRC is too long, allow strlcpy(DST, SRC, sizeof DST) to return any value greater than or equal to sizeof DST. This will support the vast majority of uses of strlcpy, which do not need or use the string length.
The behavior of returning the logical length is the same or similar to several other POSIX functions, including snprintf and regerror. In fact, OpenBSD's regerror implementation uses strlcpy to provide the POSIX-required regerror return value semantics.
It's not uncommon to use the returned logical length to resize a buffer appropriately--e.g. to call regerror once, and upon truncation realloc a buffer and call it again. So simply surveying the immediate sites where strlcpy is used is not going to capture real-world dependencies. Changing this semantic sabotages the purpose of standardizing strlcpy.
Most of your objections seem like general gripes with C strings in general. They may be legitimate enough, but strike me as irrelevant. Ditto wrt AddressSanitizer and FORTIFY_SOURCE, which aren't even standardized and are largely platform dependent.
 If subsequent calls are not idempotent, e.g. because the locale has been changed, then hopefully the caller is using a loop, but in any event idempotency is often expected, rightly or wrongly. There are potential pitfalls like this all over the place. (POSIX says changing locale between regcomp and regexec is undefined, but is silent wrt regerror.) Objections to strlcpy have always struck me as making perfect the enemy of the good. All of this hand-wringing would be better spent on the C committee advocating for proposals that improve and extend variably modified array types. Some of these proposals come close to effectively specifying dependent typing semantics between pointer and size parameters and even struct members. This would be much better--easier, more consistent, more reliable--than the __builtin_object_size hacks FORTIFY_SOURCE rely upon.
> It's not uncommon to use the returned logical length to resize a buffer appropriately
That's not the case for FreeBSD. It has thousands of calls to strlcpy. Most don't inspect the return value, and would suffer from silent truncation if the source string were too long. The relatively few places that inspect the return value almost invariably error out if the source string is too long. It is exceedingly rare for calling code to behave as you describe.
Again, I'm not proposing that FreeBSD change its strlcpy implementation, only that the standard should not prohibit more-useful implementations, assuming it standardizes strlcpy at all.
> Most of your objections seem like general gripes with C strings in general.
No, they're specifically aimed at strlcpy and its three relatives.
|Updated to give the page and line numbers from draft 2.1|
|There seems to be some confusion about the status of these interfaces. Not only are they in draft 2.1 but the source for that addition is a published (non-draft) standard of The Open Group https://publications.opengroup.org/c211 [^]|
The current draft text captures the essence of the existing implementations.
Changing the text as suggested could produce observably different implementations that are not compatible.
This would cause portability traps for software ported from BSD to an implementation that makes use of the additional permissions that are being suggested.
> That's not the case for FreeBSD. It has thousands of calls to strlcpy. Most
> don't inspect the return value
My experience has been similar. *MAJORITY* of the strlcpy calls I see don't
inspect the return value.
This is because most people don't really want strlcpy, what they *actually*
want is a string copy function which will truncate if needed (assuming
truncation is desired).
strlcpy is simply used as a (horrible) means to this end. What people *really*
want is something more closer to this:
strcpy_trunc(buf, src, sizeof buf);
This function would act similar to the following, no need to suffer from O(n)
if (memccpy(buf, src, '\0', sizeof buf) == NULL)
buf[sizeof buf - 1] = '\0';
And perhaps return the pointer to '\0' inside buf (similar to stpcpy) as a
bonus which can be used for quickly calculating the strlen or for efficient
char *strcpy_trunc(char *restrict dest, const char *restrict src, size_t n);
Such a function would give people what they *actually* want; which is to copy a
string into a fixed-buffer, truncating if necessary; while being efficient as
I should also note that strlcpy isn't even that widely adopted. Neither windows
nor glibc provides it. IMO it's not a good idea standardizing an inefficient,
not widely implemented and abused function.
If this gets standardized, more people are going to abusing it as it and when a
better function (such as the one shown above) does later get standardized,
there's already going to be too many code-base which will rely on this
non-optimal one making it hard to remove.
|2022-06-29 18:39||eggert||New Issue|
|2022-06-29 18:39||eggert||Name||=> Paul Eggert|
|2022-06-29 18:39||eggert||Organization||=> UCLA|
|2022-06-29 18:39||eggert||Section||=> no section number yet (no draft yet)|
|2022-06-29 18:39||eggert||Page Number||=> no page number yet (no draft yet)|
|2022-06-29 18:39||eggert||Line Number||=> no line number yet (no draft yet)|
|2022-06-30 01:12||wahern||Note Added: 0005872|
|2022-06-30 04:49||eggert||Note Added: 0005873|
|2022-06-30 08:38||geoffclare||Section||no section number yet (no draft yet) => strlcat, wcslcat|
|2022-06-30 08:38||geoffclare||Page Number||no page number yet (no draft yet) => 2005, 2216|
|2022-06-30 08:38||geoffclare||Line Number||no line number yet (no draft yet) => 65136, 71365|
|2022-06-30 08:38||geoffclare||Note Added: 0005874|
|2022-06-30 08:38||geoffclare||version||=> Draft 2.1|
|2022-06-30 08:39||geoffclare||Relationship added||related to 0000986|
|2022-06-30 08:53||geoffclare||Note Added: 0005875|
|2022-06-30 13:15||carlos||Note Added: 0005876|
|2022-07-11 15:49||nrk||Issue Monitored: nrk|
|2022-07-25 14:00||nrk||Note Added: 0005907|
|2022-08-15 13:35||Florian Weimer||Issue Monitored: Florian Weimer|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|