0000761: Requirement of error for snprintf with n>INT_MAX may conflict with ISO C

Notes
(0001864) Don Cragun (manager) 2013-10-09 01:15	The standard already clearly marks both EOVERFLOW errors as extensions to the C Standard's requirements. Given that an application calling snprintf() with a value of n > INT_MAX is asking the implementation to (possibly) store more than INT_MAX bytes into a buffer and then to return the number of bytes stored (which it cannot do with a return value of type int), it seems perfectly reasonable to me to do some up-front error checking and report that the given buffer size is invalid. If snprintf() were allowed to stop conversions early so it could produce an incomplete result that would allow a return value <= INT_MAX, it would be different. It looks to me like it is a programming error to pass a buffer to snprintf() that is larger than INT_MAX. Therefore, I believe the error should remain as a SHALL FAIL error. At the very least, n > INT_MAX must remain as a MAY FAIL error and return value > INT_MAX must remain as a SHALL FAIL error.

(0001865) dalias (reporter) 2013-10-09 02:18	An extension to the C standard's requirements cannot conflict with them, though. To actually be an extension, it must merely define behavior in cases that are left undefined or implementation-defined by the C standard. This does not seem to be such a case. As for the second paragraph of note 1864, I don't see calling snprintf with n>INT_MAX as "asking" the implementation to possibly store more characters. From the caller's perspective, n is simply the size of the buffer, and having to check that the buffer is not gratuitously large before passing it to snprintf places an unnecessary and inconvenient burden on the application: if you're working with a caller-provided buffer and size, then instead of: result = snprintf(buf, size, ...); you must do something like: if (size > INT_MAX) size = INT_MAX; result = snprintf(buf, size, ...); It's unlikely that existing applications make this check (i.e. it's likely that they have bugs caused by this issue), so removing this error case from the standard would possibly improve the behavior of existing applications under rare corner cases. (Also note that, per the report in glibc bug #14771, glibc is presently not generating this error. I have not however read the relevant glibc source or conducted tests to confirm that this bug report is correct.)

(0001866) wlerch (reporter) 2013-10-09 02:29	Consider snprintf( buf, SIZE_MAX, "foo" ); Is this not asking the implementation to store precisely four bytes in buf, and return 3?

(0001867) shware_systems (reporter) 2013-10-09 04:31 edited on: 2013-10-09 04:49	I agree it is sufficiently poor that it should be referred to the C standard. From the C99 standard it has: 7.19.6.5 The snprintf function 3 The snprintf function returns the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred. Thus, the null-terminated output has been completely written if and only if the returned value is nonnegative and less than n. The POSIX text uses the same wording for successful return and similar for the n equals 0 case text (line 30426 and 30431). While the intent is fairly obvious it is not specific enough given the type mismatch of the n argument and the return value. This should be more like: 3 If an encoding error occurs the snprintf function shall return a negative number indicating this. What characters, if any, are written to the destination buffer is undefined. 4 Otherwise, upon successful completion the snprintf function returns the number of characters, less than n, that have been placed in the destination buffer. 5 If n is 0 or more than n-1 characters are required the function shall place the leading n characters in the buffer and return the number greater than n of the buffer size needed for successful completion. 6 If this positive return value is larger than INT_MAX, a negative number shall be returned indicating the return value is not representable. 7 Thus, the null-terminated output is known to be completely written if and only if the returned value is nonnegative and less than n. When the return value is not representable it is not guaranteed that a terminating NUL character has been written to the buffer. This allows n to be anything up to SIZE_MAX. If only 1 or 2 characters are actually written this is comfortably below INT_MAX and the function should succeed, as note 1866 suggests, so an n greater than INT_MAX on input I'd consider as a 'may fail' indicator than 'shall fail' automatically. If the function attempts to write to non-existent memory that's a SIGBUS or memory protection event; not the C standards problem. It's when n is greater than INT_MAX and the function tries to write past INT_MAX-1 characters that is the problem and then for POSIX it's a 'shall fail' with EOVERFLOW. Separate text indicating this is a source for EOVERFLOW I'd still keep under Error Codes. The C standard could change the type of n to int also.

(0001868) shware_systems (reporter) 2013-10-09 06:28	Re: #1864 It's a reasonable check on the surface, yes, but with size_t long long because of a 64-bit address space, it is possible to write more than 2 gigabytes in a single call with int as a long, with enough RAM or page file space. Because of that the check should be done by the application, it looks, not the interface. When n is larger than INT_MAX I think it would be more reasonable to use that as a trigger to do write probes as part of the interface, even if it slows things down, and return an error because of that instead of risk a SIGBUS event, but that would be a quality of implementation extension. The prototype implies size_t shall use no more bytes than int, and should probably have a separate size_t count argument to hold the result because this isn't guaranteed. The int return value could be encoding error <0, 0 for EOVERFLOW, and >0 if count does fit in an int anyways. I expect the return was left as int to be compatible with the return of the other interfaces, and size_t might be a typedef of an unsigned integer. Perhaps something like, as a CX extension like dprintf, "int scprintf(char restrict s, size_t n, size_t restrict count, const char restrict format, ...);" would be useful, even if invention, to address this.

(0001930) Don Cragun (manager) 2013-10-17 16:36 edited on: 2013-10-17 16:50	After a lengthy discussion during the October 17, 2013 conference call, we decided that we agree with the sentiment stated in Note: 0001864. The C standard does not specify the behavior when n > INT_MAX, so the current specified behavior does not conflict with the C standard. Although it is possible that snprintf() could be given a huge buffer and use sizeof(buffer) to set the value of n, we believe that (other than for demonstration purposes) this is unlikely to occur. We do not believe the standard needs to be changed for this corner case. Therefore, this bug is rejected.

(0002615) dalias (reporter) 2015-04-09 19:32	Could this possibly be reopened? I've been shown a new example where the issue arises in practice, and it's a very ugly corner case. Consider an implementation of asprintf (or a similar function) as: int vasprintf(char *p, const char fmt, va_list ap) { va_list vp; va_copy(vp, ap); int l = vsnprintf(0, 0, fmt, vp); va_end(vp); if (l < 0) return -1; char s = malloc((size_t)l+1); if (!s) return -1; return vsnprintf(p = s, (size_t)l+1, fmt, ap); } One possibility is that vsnprintf returns exactly INT_MAX, in which case the output buffer has to be INT_MAX+1 bytes in length, triggering a spurious error from the second vsnprintf call despite the fact that the result fits and the return value for success (INT_MAX) is representable in type int. It could be argued that the above code should be checking the return value of the second vsnprintf call, but I suspect a large volume of code is assuming that the second call will succeed if the first did, and very bad things could happen (silently using uninitialized memory instead of the result) when the second call gratuitously fails. I also disagree with the claim in note 0001930 that ISO C does not specify a behavior fr n>INT_MAX. The standard as written specifies the behavior for any value of n representable in size_t as long as the pointed-to buffer object is at least n bytes in size. I can see no justification for a claim that the specification does not apply to values of n larger than INT_MAX.

(0002616) Don Cragun (manager) 2015-04-09 20:18 edited on: 2015-04-09 20:21	In reference to Note: 0002615: Can you please explain what ISO C specifies as the return value from the call: snprintf(b,(size_t)INT_MAX+10,"Test: %.s\n",INT_MAX-1,INT_MAX-1,""); which by my count should be INT_MAX + 6 returned in an object of type int.

(0002617) wlerch (reporter) 2015-04-09 20:39	Don: there's no difference between what ISO C says about the return value your call and from this: snprintf(NULL, 0, "Test: %.s\n",INT_MAX-1,INT_MAX-1,""); In both cases the requirement is impossible, demonstrating a defect in ISO C. The defect is independent of whether n is greater than INT_MAX or not. On the other hand, ISO C is perfectly clear about the call from my earlier example: snprintf( buf, SIZE_MAX, "foo" ); Assuming buf has room for at least four characters, the call is required to return 3. But POSIX wants it to fail. That's a conflict.

(0002618) dalias (reporter) 2015-04-10 00:59	Don, I agree that ISO C does not adequately specify the behavior when the return value would overflow int. Thus I have no objection to the requirement (added by POSIX) that snprintf report an error in this case. What I object to, and consider a conflict with ISO C, is the requirement to fail and report an error in the case where the argument n is large but the return value does not overflow. I believe the language in ISO C specifies what happens in this case, and the specified result is success (non-negative return value).

(0003010) ch3root (reporter) 2016-01-06 15:16	An additional data point. I've looked through mand pages in several BSDs: OpenBSD[1] has no mentions of EOVERFLOW; FreeBSD[2] documents EOVERFLOW for the case size > INT_MAX + 1; NetBSD[3] documents EOVERFLOW for the case size > INT_MAX. AFAICT their implementations agree with the man pages. That is, OpenBSD and FreeBSD don't conform with the POSIX requirement discussed here. And NetBSD seems to conform due to misunderstanding of the FreeBSD code (according to the commit message in [4]) :-) [1] http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/snprintf.3 [^] [2] https://www.freebsd.org/cgi/man.cgi?query=snprintf&sektion=3 [^] [3] http://netbsd.gw.com/cgi-bin/man-cgi?snprintf++NetBSD-current [^] [4] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/stdio/vsnprintf.c?rev=1.28&content-type=text/x-cvsweb-markup [^]

(0003013) ch3root (reporter) 2016-01-06 20:36	There are 3 intermixed issues here. 1. The value of the limit. The details depend on the goal of the n>INT_MAX check. Is there a way to find out exactly why this check was introduced in POSIX? A plausible reason is to catch early those cases where snprintf could potentially be required to return a number which it cannot return. Then there is an off-by-one error here: n includes the terminating null but the return value doesn't. Thus, the intended check should be n>(size_t)INT_MAX+1. This is what FreeBSD does. Another possible reason is to catch cases where a user passes a negative int to snprintf. This indeed requires the check to be "n>INT_MAX" on platforms where int and size_t have the same size. But then you run into a problem (described in http://austingroupbugs.net/view.php?id=761#c2615) [^] that you cannot pass result from one snprintf to another snprintf. 2. The idea of early exits itself, i.e. the case when n is big but the the amount of characters actually written is small. Like this: char s[10]; snprintf(s, SIZE_MAX, "abc"); It's fine in ISO C but not in POSIX. Perhaps the conflict in the description of the n argument that I recently reported in http://austingroupbugs.net/view.php?id=1020 [^] contributed here. According to the current text of POSIX the example above is invalid because SIZE_MAX is not the size of the buffer. But there is no such a requirement in ISO C. I can imagine that the idea behind the n>INT_MAX check was that, apart from catching accidentally bogus values of n, the check will only affect callers with very large buffers which are rare. (And this is stated in http://austingroupbugs.net/view.php?id=761#c1930 [^] .) Even in this case the restriction would conflict with ISO C but given that n is not required to be the actual size of the buffer the scope of the problem is not limited to large buffers. Also, if a user is not supposed to pass values greater than INT_MAX why the argument is of type size_t? There could be various reasons for this but, at first sight, it hints that values greater than INT_MAX are welcome here. 3. Big return values. What should snprintf do when the value to be returned is greater than INT_MAX? This is not specific to snprintf, plain printf has the same problem. And POSIX has _another_ clause dealing with this case, which is applicable to all forms of fprintf. AIUI this bug report is about a separate clause, specific to snprintf. Thus, all arguments about impossible-to-return results of snprintf are not relevant to this bug report.

(0003021) ch3root (reporter) 2016-01-10 19:46	1. Further research has shown that the behavior when the return value would overflow int was clarified by WG14 in C99 by adding it into the list of undefined behaviors in Annex J. It was updated in C11 to the following text: "J.2 Undefined behavior The behavior is undefined in the following circumstances: [skip] — The number of characters or wide characters transmitted by a formatted output function (or written to an array, or that would have been written to an array) is greater than INT_MAX (7.21.6.1, 7.29.2.1)." Please note that this description does _not_ mention the size argument of snprintf or the size of the buffer. 2. The same problem affects the swprintf function. (With a twist -- see http://austingroupbugs.net/view.php?id=1021 [^] .)

Issue History
Date Modified	Username	Field	Change
2013-10-08 23:17	dalias	New Issue
2013-10-08 23:17	dalias	Name	=> Rich Felker
2013-10-08 23:17	dalias	Organization	=> musl libc
2013-10-08 23:17	dalias	Section	=> snprintf
2013-10-08 23:17	dalias	Page Number	=> unknown
2013-10-08 23:17	dalias	Line Number	=> unknown
2013-10-09 01:15	Don Cragun	Page Number	unknown => 906
2013-10-09 01:15	Don Cragun	Line Number	unknown => 30447
2013-10-09 01:15	Don Cragun	Interp Status	=> ---
2013-10-09 01:15	Don Cragun	Note Added: 0001864
2013-10-09 02:18	dalias	Note Added: 0001865
2013-10-09 02:29	wlerch	Note Added: 0001866
2013-10-09 04:31	shware_systems	Note Added: 0001867
2013-10-09 04:49	shware_systems	Note Edited: 0001867
2013-10-09 06:28	shware_systems	Note Added: 0001868
2013-10-09 06:54	Florian Weimer	Issue Monitored: Florian Weimer
2013-10-17 16:01	nick	Tag Attached: C11
2013-10-17 16:36	Don Cragun	Note Added: 0001930
2013-10-17 16:36	Don Cragun	Status	New => Resolved
2013-10-17 16:36	Don Cragun	Resolution	Open => Rejected
2013-10-17 16:38	Don Cragun	Note Edited: 0001930
2013-10-17 16:50	Don Cragun	Note Edited: 0001930
2015-04-09 19:32	dalias	Note Added: 0002615
2015-04-09 20:18	Don Cragun	Note Added: 0002616
2015-04-09 20:19	Don Cragun	Note Edited: 0002616
2015-04-09 20:20	Don Cragun	Note Edited: 0002616
2015-04-09 20:21	Don Cragun	Note Edited: 0002616
2015-04-09 20:39	wlerch	Note Added: 0002617
2015-04-10 00:59	dalias	Note Added: 0002618
2015-05-07 08:23	dancol	Issue Monitored: dancol
2016-01-05 18:17	ch3root	Issue Monitored: ch3root
2016-01-05 20:18	eblake	Relationship added	related to 0001020
2016-01-06 15:16	ch3root	Note Added: 0003010
2016-01-06 20:36	ch3root	Note Added: 0003013
2016-01-10 19:46	ch3root	Note Added: 0003021
2018-12-12 21:06	eblake	Relationship added	related to 0001219
2022-07-18 15:43	Don Cragun	Status	Resolved => Closed

Aardvark Mark IV