View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001181 | 1003.1(2016/18)/Issue7+TC2 | System Interfaces | public | 2017-12-29 13:52 | 2019-02-18 17:02 |
Reporter | steffen | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | Closed | Resolution | Rejected | ||
Name | steffen | ||||
Organization | |||||
User Reference | |||||
Section | sscanf | ||||
Page Number | 951 | ||||
Line Number | 32310 ff. | ||||
Interp Status | --- | ||||
Final Accepted Text | |||||
Summary | 0001181: *scanf(): %i format: seems to use unsigned in the wild for known base(s) | ||||
Description | Very deep in what i got used to is that the high bit represents the signedness state of a number, and i tend to write hexadecimal numbers this way. That is to say that i always had problems with how that standard strtol() series fails to parse hexadecimal 0x80000000 (with 32-bit long), not to talk about 0xFFFFFFFF (ditto), which simply is -1: i would never write -0x1 to get that in hexadecimal. But that is how the standard is written. Interestingly it seems that some libraries support that kind of view for the %i format of the *scanf() series, even though it is the standard which says that %i shall act like strtol() with a base of 0. The behaviour does not seem to be consistent though (note i cannot test Solaris because i have messed my OpenCSW account, grmpf!): E.g., the program below on a 64-bit musl Linux: ?0[steffen@essex tmp]$ tcc -run t.c 0xFFFFFFFFFFFFFF7F: 0xFFFFFFFFFFFFFF7F ?0[steffen@essex tmp]$ tcc -run t.c 1 0xFFFFFFFFFFFFFF7F: 0x7FFFFFFFFFFFFFFF (34: Result not representable) On a 32-bit OpenBSD: ?0[steffen@obsd tmp]$ ./zt 0xFFFFFFFFFFFFFF7F: 0x7FFFFFFFFFFFFFFF ?0[steffen@obsd tmp]$ ./zt 1 0xFFFFFFFFFFFFFF7F: 0x7FFFFFFFFFFFFFFF (34: Result too large) But, adjusted to work with long/strtol() instead we get where i want to be! ?0[steffen@obsd tmp]$ ./zt 1 0xFFFFFF7F: 0x7FFFFFFF (34: Result too large) ?0[steffen@obsd tmp]$ ./zt 0xFFFFFF7F: 0xFFFFFF7F As can be seen unsigned parse mode seems to become used for %i on multiple systems if the number uses a power-of-two base with known prefix. The program is: #include <errno.h> #include <stdlib.h> #include <stdio.h> #include <string.h> int main(int argc, char **argv){ char buf[32]; int e; long long l; l = -129; snprintf(buf, sizeof buf, "0x%llX", l); printf("%s: ", buf); if(argc > 1){ errno = 0; l = strtoll(buf, NULL, 0); e = errno; printf("0x%llX (%d: %s)\n", l, e, strerror(e)); }else{ sscanf(buf, "%lli", &l); printf("0x%llX\n", l); } return 0; } | ||||
Desired Action | Allow automatic treatment of power-of-two bases with known standardized prefix as unsigned for *scanf() %i and strtol*(). | ||||
Tags | No tags attached. |
|
Per C11 7.21.6.2p10 (fscanf()) such usage is undefined behavior, as an error demoting unsigned of larger type rank to signed. Because of the related clauses in Section 6, due to the strto* references in the text, I think this is an issue to be brought up to the C working group, not something for POSIX to address unilaterally and still defer to C. |
|
The description is incorrect. 0xFFFFFFFF is not -1. It is a positive integer. From a user point of view, 0xFFFFFFFF may be regarded as equivalent to -1 for 32-bit integers, but this would be possible only for two's complement, while in C, integers can have other representations. Now, POSIX requires two's complement, so that it could choose to define some C's undefined behaviors; this may not be a good idea, though, since this may introduce inconsistencies with future C standards. There would be another drawback: some optimizations based on UB would no longer be possible. A solution could be to standardize GCC's -fwrapv option, but I don't think it is worth. Note: this applies to operations like +, - and *. Not sure about scanf(), but as a user, if I enter a too large positive integer, I would prefer an error rather than seeing it as a negative integer! |
|
Sorry for the delay. Re #3908: i do not think that 7.21.6.2p10 really applies here. strtol(), yes. Re #3910: I am absolutely fine with stating that all bits set is a positive integer. In the meanwhile i have tested Solaris and the test program yields the same a.k.a. the desired results on SunOS 5.9 (2003), SunOS 5.10 (2005) and SunOS 5.11 (2015) as well. Thanks to OpenCSW for fixing my account! I think if all platforms POSIX cares about yield the desired result and automatically choose unsigned parse mode when they see hexadecimal, which is always produced as unsigned by e.g. %X, the standard has to follow and turn this into something printed. It is a pity that strtol() |
|
hrm. sorry. at least entering an URL the first time can be changed not to go over a search engine. It is a pity that strtol() cannot be adjusted, too. At least binary and octal bases would benefit from automatic switch to unsigned mode if the user base is 0, too. (And only then, not when an explicit base was given, since in this case the programmer seems to know what is expected and can very well choose strtoul() as necessary.) |
|
Re: 3911: It is not that the implementations switch to unsigned mode - they are required to start in that mode, whatever the base, and if the interface in question applies to a signed type, the optional sign character is applied afterwards, to be in keeping with the syntax of C11 6.4.4.1 and constant expressions. The explicit mention of sign chars being allowed is because otherwise they're unary operators, syntactically, not part of the constant. This holds for pp-nums as well as float and integer constants, and at runtime it doesn't matter if it's sscanf() %ioux or strto*() doing the converting. As a runtime expression unsigned long X = -0xFFFFFFFF; gets typed to unsigned long first, then to long long int so the sign can be applied without the result being out of range, and then cast back to unsigned long in the assignment by demotion using repeated addition per 6.3.1.3p2. By p3 long X = -0xFFFFFFFF; starts as unsigned long, casts to long long, then should abort the app with a demotion error due to the range, which sscanf() considers undefined, as stricter than implementation-defined, because when multiple operands are present it's ambiguous which might produce an error return code. The phrasing of strto*() is so they can return ERANGE rather than the otherwise required compiler provided diagnostic for that error condition, and this error only applies to the provided argument. While a compiler can provide a diagnostics suppressing #pragma, I don't see that this applies to library code. The examples don't show those messages, however, just the printf()s, so those implementations degree of conformance is suspect, imo. |
|
This was discussed during the 2019-02-18 conference call and is being rejected for the following reason: The C standard requires that scanf("%i", ...) store a signed value into the requested target integer, and that behavior is undefined on overflow. The fact that 0xffffffff is a positive value that is too large to fit into a 32-bit integer type triggers undefined behavior, and thus it is permissible for scanf() to treat it as an error. To get what you want you could try using strtoul() with a 0 base. |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-12-29 13:52 | steffen | New Issue | |
2017-12-29 13:52 | steffen | Name | => steffen |
2017-12-29 13:52 | steffen | Section | => sscanf |
2017-12-29 13:52 | steffen | Page Number | => 951 |
2017-12-29 13:52 | steffen | Line Number | => 32310 ff. |
2018-01-05 07:36 | shware_systems | Note Added: 0003908 | |
2018-01-08 16:32 | Vincent Lefevre | Note Added: 0003910 | |
2018-01-09 18:43 | steffen | Note Added: 0003911 | |
2018-01-09 18:51 | steffen | Note Added: 0003912 | |
2018-01-09 23:00 | shware_systems | Note Added: 0003913 | |
2019-02-18 16:59 | Don Cragun | Note Added: 0004258 | |
2019-02-18 17:01 | Don Cragun | Note Edited: 0004258 | |
2019-02-18 17:02 | Don Cragun | Interp Status | => --- |
2019-02-18 17:02 | Don Cragun | Status | New => Closed |
2019-02-18 17:02 | Don Cragun | Resolution | Open => Rejected |