0001591: Proposed strlcpy spec has problems - Austin Group Issue Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0001591	Issue 8 drafts	System Interfaces	public	2022-06-29 18:39	2022-10-31 15:33

Reporter	eggert	Assigned To
Priority	normal	Severity	Objection	Type	Error
Status	Closed	Resolution	Rejected
Product Version	Draft 2.1

Name	Paul Eggert
Organization	UCLA
User Reference
Section	strlcat, wcslcat
Page Number	2005, 2216
Line Number	65136, 71365
Final Accepted Text


Summary	0001591: Proposed strlcpy spec has problems
Description	In <https://sourceware.org/pipermail/libc-alpha/2022-June/140093.html> Florian Weimer brought to my attention Austin Group Issue 986 <https://www.austingroupbugs.net/view.php?id=986>. Although this issue is labeled as "Editorial" and "Clarification Requested", its resolution took a proposed approach that went well beyond editorial or clarification. The proposed approach overlooks some important technical issues that need to be considered before attempting to impose a standard in the controversial area of standardizing strlcpy and related functions. Here are some problems with the proposed approach: * Although strlcpy may have been a reasonable hack when introduced in the 1990s, technology has passed it by and we now have less-intrusive and more-reliable approaches such as FORTIFY_SOURCE and AddressSanitizer. * With commonly-available current technologies, a program that uses strlcpy is typically less secure than one that uses strcpy, because when a fixed-size buffer would be overrun strlcpy silently corrupts the output data via truncation, whereas strcpy catches and reports the program error. * Even though the call strlcpy(DST, SRC, 2 * sizeof DST) is clearly wrong and dangerous, the proposed spec says the call happens to have well-defined behavior if strlen(SRC) < sizeof DST, and requires a conforming implementation to implement it as if via strlen(strcpy(DST, SRC)). Programmers do not need this requirement, and the unnecessary requirement stands in the way of checking strlcpy calls for common programming errors involving miscalculated sizes. * strlcpy introduces the possibility of denial-of-service attacks. The time cost of strlcpy(DST, SRC, sizeof DST) is O(strlen(SRC)), which makes it vulnerable to a denial-of-service attack when the attacker controls SRC and can specify a long SRC string. In this respect, strlcpy is worse than then already-standard (strncpy(DST, SRC, sizeof DST - 1), DST[sizeof DST - 1] = '\0') which despite its other flaws at least has the virtue of costing only O(sizeof DST), a cost that is under the defender's control and is typically small. * There is no demonstrated need for the wcslcpy and wcslcat functions. For example, there is not a single call to either function in the FreeBSD source code. Requiring these functions of every implementation would be makework and a net minus for the community. * There is no good evidence that modifying code to use strlcpy improves security more than using already-standardized approaches, even if technologies like FORTIFY_SOURCE and AddressSanitizer are not available.
Desired Action	The simplest way to address these issues is to revert the proposal and not add strlcpy etc. to the standard. Failing that, at least the following actions should be taken: * To address the denial-of-service problem, drop the requirement that strlcpy must compute the string length of an overlong source. If SRC is too long, allow strlcpy(DST, SRC, sizeof DST) to return any value greater than or equal to sizeof DST. This will support the vast majority of uses of strlcpy, which do not need or use the string length. * Allow strlcpy(DST, SRC, sizeof DST) to trap if SRC is too long. This will allow strlcpy implementations to catch and report programming errors that would otherwise result in silent data corruption via truncation. I just now surveyed FreeBSD's bin directory sources, and found that most of its uses of strlcpy fall into this category. (I am not proposing that FreeBSD change its strlcpy implementation; I'm merely noting that in practice, most strlcpy uses would benefit from allowing strlcpy implementations to trap.) * Allow the implementation of strlcpy(DST, SRC, sizeof DST) to modify all the bytes of DST, rather than only those bytes up through strlen(SRC)+1. This would help debugging implementations catch programming errors like strlcpy(DST, SRC, 2 * sizeof DST). For example, assuming DST has a fixed nonzero size, it should be OK to implement strlcpy(DST, SRC, sizeof DST) via (DST[sizeof DST - 1] = '\0', strlen(strncpy(DST, SRC, sizeof DST - 1)). * Remove the wcslcpy and wcslcat functions from the proposal. The cost of standardizing and supporting these unused functions outweighs any benefit. * Whatever other action is taken, modify the rationale to state clearly the engineering problems that strlcpy is designed to address, and the pros and cons of using strlcpy versus using other popular approaches. This is a problematic area, and readers of the standard cannot be expected to be familiar with the ins and outs. None of these suggested actions would invalidate existing strlcpy implementations, which would not need to be changed.
Tags	No tags attached.

wahern 2022-06-30 01:12 reporter bugnote:0005872	> * To address the denial-of-service problem, drop the requirement that strlcpy must compute the string length of an overlong source. If SRC is too long, allow strlcpy(DST, SRC, sizeof DST) to return any value greater than or equal to sizeof DST. This will support the vast majority of uses of strlcpy, which do not need or use the string length. The behavior of returning the logical length is the same or similar to several other POSIX functions, including snprintf and regerror. In fact, OpenBSD's regerror implementation uses strlcpy to provide the POSIX-required regerror return value semantics. It's not uncommon to use the returned logical length to resize a buffer appropriately--e.g. to call regerror once, and upon truncation realloc a buffer and call it again.[1] So simply surveying the immediate sites where strlcpy is used is not going to capture real-world dependencies. Changing this semantic sabotages the purpose of standardizing strlcpy. Most of your objections seem like general gripes with C strings in general. They may be legitimate enough, but strike me as irrelevant. Ditto wrt AddressSanitizer and FORTIFY_SOURCE, which aren't even standardized and are largely platform dependent. [1] If subsequent calls are not idempotent, e.g. because the locale has been changed, then hopefully the caller is using a loop, but in any event idempotency is often expected, rightly or wrongly. There are potential pitfalls like this all over the place. (POSIX says changing locale between regcomp and regexec is undefined, but is silent wrt regerror.) Objections to strlcpy have always struck me as making perfect the enemy of the good. All of this hand-wringing would be better spent on the C committee advocating for proposals that improve and extend variably modified array types. Some of these proposals come close to effectively specifying dependent typing semantics between pointer and size parameters and even struct members. This would be much better--easier, more consistent, more reliable--than the __builtin_object_size hacks FORTIFY_SOURCE rely upon.

eggert 2022-06-30 04:49 reporter bugnote:0005873	> It's not uncommon to use the returned logical length to resize a buffer appropriately That's not the case for FreeBSD. It has thousands of calls to strlcpy. Most don't inspect the return value, and would suffer from silent truncation if the source string were too long. The relatively few places that inspect the return value almost invariably error out if the source string is too long. It is exceedingly rare for calling code to behave as you describe. Again, I'm not proposing that FreeBSD change its strlcpy implementation, only that the standard should not prohibit more-useful implementations, assuming it standardizes strlcpy at all. > Most of your objections seem like general gripes with C strings in general. No, they're specifically aimed at strlcpy and its three relatives.

geoffclare 2022-06-30 08:38 manager bugnote:0005874	Updated to give the page and line numbers from draft 2.1

geoffclare 2022-06-30 08:53 manager bugnote:0005875	There seems to be some confusion about the status of these interfaces. Not only are they in draft 2.1 but the source for that addition is a published (non-draft) standard of The Open Group https://publications.opengroup.org/c211

carlos 2022-06-30 13:15 reporter bugnote:0005876	The current draft text captures the essence of the existing implementations. Changing the text as suggested could produce observably different implementations that are not compatible. This would cause portability traps for software ported from BSD to an implementation that makes use of the additional permissions that are being suggested.

nrk 2022-07-25 14:00 reporter bugnote:0005907	> That's not the case for FreeBSD. It has thousands of calls to strlcpy. Most > don't inspect the return value My experience has been similar. MAJORITY of the strlcpy calls I see don't inspect the return value. This is because most people don't really want strlcpy, what they actually want is a string copy function which will truncate if needed (assuming truncation is desired). strlcpy is simply used as a (horrible) means to this end. What people really want is something more closer to this: strcpy_trunc(buf, src, sizeof buf); This function would act similar to the following, no need to suffer from O(n) strlen(src) call: if (memccpy(buf, src, '\0', sizeof buf) == NULL) buf[sizeof buf - 1] = '\0'; And perhaps return the pointer to '\0' inside buf (similar to stpcpy) as a bonus which can be used for quickly calculating the strlen or for efficient string concatenation. char strcpy_trunc(char restrict dest, const char restrict src, size_t n); Such a function would give people what they actually* want; which is to copy a string into a fixed-buffer, truncating if necessary; while being efficient as well. I should also note that strlcpy isn't even that widely adopted. Neither windows nor glibc provides it. IMO it's not a good idea standardizing an inefficient, not widely implemented and abused function. If this gets standardized, more people are going to abusing it as it and when a better function (such as the one shown above) does later get standardized, there's already going to be too many code-base which will rely on this non-optimal one making it hard to remove.

~~Don Cragun~~ 2022-10-31 15:33 viewer bugnote:0006017	These functions are being added to the standard because they are in widespread use, not because of any considerations relating to security or efficiency. Now that they are in the draft there is no consensus for removing them and therefore this bug is being rejected.

Date Modified	Username	Field	Change
2022-06-29 18:39	eggert	New Issue
2022-06-29 18:39	eggert	Name	=> Paul Eggert
2022-06-29 18:39	eggert	Organization	=> UCLA
2022-06-29 18:39	eggert	Section	=> no section number yet (no draft yet)
2022-06-29 18:39	eggert	Page Number	=> no page number yet (no draft yet)
2022-06-29 18:39	eggert	Line Number	=> no line number yet (no draft yet)
2022-06-30 01:12	wahern	Note Added: 0005872
2022-06-30 04:49	eggert	Note Added: 0005873
2022-06-30 08:38	geoffclare	Section	no section number yet (no draft yet) => strlcat, wcslcat
2022-06-30 08:38	geoffclare	Page Number	no page number yet (no draft yet) => 2005, 2216
2022-06-30 08:38	geoffclare	Line Number	no line number yet (no draft yet) => 65136, 71365
2022-06-30 08:38	geoffclare	Note Added: 0005874
2022-06-30 08:38	geoffclare	Product Version	=> Draft 2.1
2022-06-30 08:39	geoffclare	Relationship added	related to 0000986
2022-06-30 08:53	geoffclare	Note Added: 0005875
2022-06-30 13:15	carlos	Note Added: 0005876
2022-07-25 14:00	nrk	Note Added: 0005907
2022-10-31 15:33	~~Don Cragun~~	Note Added: 0006017
2022-10-31 15:33	~~Don Cragun~~	Status	New => Closed
2022-10-31 15:33	~~Don Cragun~~	Resolution	Open => Rejected

View Issue Details

Relationships

Activities

Issue History