0001834: strnlen() & wcsnlen() descriptions use of "terminating" NUL character

ID	Project	Category	View Status	Date Submitted	Last Update

0001834	1003.1(2024)/Issue8	System Interfaces	public	2024-06-20 00:26	2025-06-05 13:42

Reporter	~~Don Cragun~~	Assigned To	geoffclare
Priority	normal	Severity	Editorial	Type	Error
Status	Applied	Resolution	Accepted As Marked

Name	Don Cragun
Organization
User Reference
Section	strlen() & wcslen()
Page Number	2147, 2380
Line Number	70218-70220, 77131-77134
Interp Status	---
Final Accepted Text	0001834:0006935


Summary	0001834: strnlen() & wcsnlen() descriptions use of "terminating" NUL character
Description	The description of the strlen() function is: The strlen() function shall compute the number of bytes in the string to which s points, not including the terminating NUL character. and this is fine since a string, by definition, is terminated by a NUL character. However, the description of the strnlen() function is: <CX> The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.</CX> but this is a problem because an array of bytes does not have a terminating character. The description needs to be rewritten to more closely match the more correctly written return value section: <CX The strnlen() function shall return the number of bytes preceding the first null byte in the array to which s points, if s contains a null byte within the first maxlen bytes; otherwise, it shall return maxlen.</CX> The description of the wcsnlen() function: <CX>The wcsnlen() function shall compute the smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the value of maxlen. The wcsnlen() function shall never examine more than the first maxlen characters of the wide-character array pointed to by ws.</CS> suffers from the same logical problem.
Desired Action	On P2147, L70218-70219 (strlen() DESCRIPTION) change: smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the value to: smaller of the number of bytes before the first null byte in the array to which s points, if there is one, and the value On P2380, L77131-77132 (wcslen() DESCRIPTION) change: smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the value to: smaller of the number of wide characters before the first null wide-character code in the array to which ws points, if there is one, and the value
Tags	tc1-2024
Attached Files	n3252b.pdf (330,264 bytes)

geoffclare 2024-06-20 11:15 manager bugnote:0006820	I agree the current wording is in need of improvement. However, since the C committee are adding these functions in their next revision, we should wait to see what wording they decide on.

~~Don Cragun~~ 2024-06-20 14:30 viewer bugnote:0006821 Last edited: 2024-06-20 14:32	Re: 0001834:0006820: We heard last week that the C committee is currently planning to call the bytes in the array a string even when those bytes do not contain a NUL byte. I think we should suggest new wording to them to avoid both the wording they are planning to use and the wording currently in POSIX.

geoffclare 2024-06-20 14:57 manager bugnote:0006822	My point was that we shouldn't just resolve this bug with new wording of our choosing; we need to liaise with the C committee and wait for their decision.

~~Don Cragun~~ 2024-06-20 15:33 viewer bugnote:0006823	We discussed this during the 2024-06-20 meeting. We believe that the wording in the Desired Action is better than the current wording in the standard. Nick will e-mail Chris Bazeley (the author of the proposal to add these function to C2Y) with this as the direction to which POSIX is leaning.

eblake 2024-06-20 18:22 manager bugnote:0006824	I asked the Linux man pages project about their willingness to update wording in strnlen.3, and they pointed me to https://man7.org/linux/man-pages/man7/string_copying.7.html as a useful resource (covers more than POSIX, and doesn't visit the similarly-affected wcs* functions, but has a nice overview of various consistently used concepts)

nick 2024-06-21 15:28 manager bugnote:0006830	An updated proposal from the C committee is attached (n3252b.pdf) 7.26.6.5 The strnlen function Synopsis 1 #include <string.h> size_t strnlen(const char s, size_t n); Description 2 The strnlen function counts not more than n characters (a null character and characters that follow it are not counted) in the array to which s points. At most the first n characters of s shall be accessed by strnlen. Returns 3 The strnlen function returns the number of characters that precede the terminating null character. If there is no null character in the first n characters of s then strnlen returns n. 7.31.4.7.3 The wcsnlen function Synopsis 1 #include <wchar.h> size_t wcsnlen(const wchar_t s, size_t n); Description 2 The wcsnlen function counts not more than n wide characters (a null wide character and wide characters that follow it are not counted) in the array to which s points. At most the first n wide characters of s shall be accessed by wcsnlen. Returns 3 The wcsnlen function returns the number of wide characters that precede the terminating null wide character. If there is no null wide character in the first n wide characters of s then wcsnlen returns n.

eblake 2024-07-10 01:31 manager bugnote:0006831	At https://lists.gnu.org/archive/html/bug-gnulib/2024-07/msg00094.html, Paul Eggert argues: > at which point, strnlen("", SIZE_MAX)_is_ allowed to_access_ beyond > the NUL byte, No it wouldn't, because strnlen must stop counting at the first null byte. If this point isn't made clear in the current proposal, it should be made clear. Lots of user code relies on strnlen doing the right thing even if the string is shorter than n. In practice implementations that screw up in this area, and are incompatible with glibc etc., are deemed broken and are fixed. The standard should not allow further breakage. The proposed wording allows an implementation to access beyond the NUL when a string is passed in, and Paul is arguing that the standard should be stricter and stop accessing at the first NUL or at n bytes, whichever is first (implying a specific linear access pattern, and preventing optimizations such as dividing the array in two, calculating constrained lengths on both halves in parallel, and then doing the appropriate math to return the correct answer even if more bytes than the returned value were accessed). That is, Paul wants code like this (present in the wild) to "work": len = strnlen (string, precision <= 0 ? SIZE_MAX : precision);

eblake 2024-07-11 15:54 manager bugnote:0006832	Summarizing 0001834:0006831, it would be nice if the C wording for strnlen() copied the C23 requirement on memchr("", 0, SIZE_MAX) reliably returning the first argument since "The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found."

kre 2024-07-12 01:23 reporter bugnote:0006833	It would be good to remember that there are two (quite different) uses for strnlen (and wcsnlen() -- though for that one the second use tends to be more complex). One is for dealing with "strings" stored in fixed length arrays of chars (or wide chars) where the desire is to use every possible storage element for meaningful data, omitting the terminating nul character if the actual data fills the array - but including the nul if the data is shorter. Examples were old style directory entries in the filesystem with char name[14] in the struct, old style utmp entries where the tty name (line) and user name were each stored in char xxx[8] arrays. In this situation it is possible to access any of 'n' (the size_t param to the function) bytes of the array, but nothing with an index of n or greater (or negative of course). The other use is for determining if a string is longer than N bytes, without caring how long it actually is. Such strings are typically stored always with a terminating nul character, in an "array" which is exactly long enough for the string, and its terminating nul. Referring to anything in that array beyond that terminating nul is undefined behaviour. Example of this are the argv and environ data, and any strings stored in appropriately sized memory from malloc(). [For why one might want this information, consider outputting a string summary in a fixed column width space (which is why the wide char example gets more complex, though the same principles apply) - eg: assume I have 30 columns of a fixed with font to display the leading part of the string, but I also want to indicate when the string was longer, and if so, indicate that it was truncated in the tradiional way (terminatng elipsis). For this, all I need to know is whether the string is 31 chars long, or longer. If it is, I will take the first 27 chars, perhaps back up from there to a word end - depending upon the context, then append " ...". The actual string might be very long (like a chapter of a book, and all I want to display is "It was the best of times, ...") I don't need to know its length, and don't want to waste time determining that (or I'd just use strlen()) so I use strnlen(string, 31) instead. If the answer is <= 30, then I simply use the entire string, however long it might be. If it is > 30 (ie: 31, the only possible case in this scenario) then I do the string truncation dance, and output that. ] The wording needs to be done in such a way that both of these scenarios work correctly.

eblake 2024-07-12 17:57 manager bugnote:0006834	It would also be nice if strncmp ("a", "b", SIZE_MAX) were guaranteed to work in linear order in the same manner as memchr() rather than an implementation being permitted to access arbitrary bytes within the two arrays according to an intentionally over-large size.

kre 2024-07-13 01:10 reporter bugnote:0006835	Re 0001834:0006834 I agree, though this would probably be better addressed by the C standard than here. But in general, I'd suggest that any function (standard ones, or user created functions) which access any character sequence parameter passed to them, beyond either the first nul character, or the provided length if there is one, should result in undefined behaviour. Naturally this only applies in cases where the incoming parameter data is defined to end at the first nul encountered. There should be a name for such objects, they are almost strings, except those require that the terminating nul exist, and this other type do not. If that were done, then the sequential access to whatevers in all of these functions would be guaranteed, as one cannot access p[1] without first ensuring that p[0] is not nul, similarly accessing p[2] requires that p[1] also not be nul (etc) - the only valid access regime is p[0] first, then p[1] up to (but not exceeding) p[N] (where N is the provided length, if that is available, or SIZE_MAX if it isn't), and never going beyond p[n] if p[n] is nul, and p[i] 0 <= i < n are all not nul. Naturally n < N. Once the maximum length that can be referenced is discovered, the implementation is free to then access (again) the bytes in the data in any order it likes. For something simple line strncmp() or strnlen() it makes no real sense to do anything other that compare (or count) the data as it is being examined for the first time, also looking for the nul terminator, but for more complex operations, like pattern matching, other access methods might work out better (matching the RE '^.*abc$' is faster by starting at the end of the input string, and working backwards, than at the beginning and working forwards). All of this works (or should) for the wide char functions as well, which is why I keep writing nul rather than '\0' as I intend that to mean the appropriate nul character for the character data type involved.

geoffclare 2024-10-24 15:43 manager bugnote:0006935	On P2147, L70218-70220 (in strlen()) change: <CX>The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.</CX> to: <CX>The strnlen() function shall count not more than maxlen bytes (a null byte and bytes that follow it are not counted) in the array to which s points. At most the first maxlen bytes of s shall be accessed by strnlen(). The implementation shall behave as if it reads the bytes sequentially and stops as soon as a null byte is found.</CX> On P2380 L77131-.77134 (in wcslen()) change: <CX>The wcsnlen() function shall compute the smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the value of maxlen. The wcsnlen() function shall never examine more than the first maxlen characters of the wide-character array pointed to by ws.</CX> to: <CX>The wcsnlen() function shall count not more than maxlen wide character codes (a null wide-character code and wide-character codes that follow it are not counted) in the array to which ws points. At most the first maxlen wide-character codes of ws shall be accessed by wcsnlen(). The implementation shall behave as if it reads the wide-character codes sequentially and stops as soon as a null wide-character code is found.</CX>

Date Modified	Username	Field	Change
2024-06-20 00:26	~~Don Cragun~~	New Issue
2024-06-20 00:26	~~Don Cragun~~	Name	=> Don Cragun
2024-06-20 00:26	~~Don Cragun~~	Section	=> strlen() & wcslen()
2024-06-20 00:26	~~Don Cragun~~	Page Number	=> 2147, 2380
2024-06-20 00:26	~~Don Cragun~~	Line Number	=> 70218-70220, 77131-77134
2024-06-20 00:26	~~Don Cragun~~	Interp Status	=> ---
2024-06-20 00:38	~~Don Cragun~~	Description Updated
2024-06-20 00:38	~~Don Cragun~~	Desired Action Updated
2024-06-20 11:15	geoffclare	Note Added: 0006820
2024-06-20 14:30	~~Don Cragun~~	Note Added: 0006821
2024-06-20 14:32	~~Don Cragun~~	Note Edited: 0006821
2024-06-20 14:57	geoffclare	Note Added: 0006822
2024-06-20 15:33	~~Don Cragun~~	Note Added: 0006823
2024-06-20 18:22	eblake	Note Added: 0006824
2024-06-21 15:24	nick	File Added: n3252b.pdf
2024-06-21 15:28	nick	Note Added: 0006830
2024-07-10 01:31	eblake	Note Added: 0006831
2024-07-11 15:54	eblake	Note Added: 0006832
2024-07-12 01:23	kre	Note Added: 0006833
2024-07-12 17:57	eblake	Note Added: 0006834
2024-07-13 01:10	kre	Note Added: 0006835
2024-10-24 15:43	geoffclare	Note Added: 0006935
2024-10-24 15:44	geoffclare	Final Accepted Text	=> 0001834:0006935
2024-10-24 15:44	geoffclare	Status	New => Resolved
2024-10-24 15:44	geoffclare	Resolution	Open => Accepted As Marked
2024-10-24 15:44	geoffclare	Tag Attached: tc1-2024
2025-06-05 13:41	geoffclare	Assigned To	=> geoffclare
2025-06-05 13:42	geoffclare	Status	Resolved => Applied

View Issue Details

Activities

Issue History