Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001181 [1003.1(2016)/Issue7+TC2] System Interfaces Editorial Clarification Requested 2017-12-29 13:52 2018-01-09 23:00
Reporter steffen View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name steffen
Organization
User Reference
Section sscanf
Page Number 951
Line Number 32310 ff.
Interp Status ---
Final Accepted Text
Summary 0001181: *scanf(): %i format: seems to use unsigned in the wild for known base(s)
Description Very deep in what i got used to is that the high bit represents the signedness state of a number, and i tend to write hexadecimal numbers this way.
That is to say that i always had problems with how that standard strtol() series fails to parse hexadecimal 0x80000000 (with 32-bit long), not to talk about 0xFFFFFFFF (ditto), which simply is -1: i would never write -0x1 to get that in hexadecimal.
But that is how the standard is written.

Interestingly it seems that some libraries support that kind of view for the %i format of the *scanf() series, even though it is the standard which says that %i shall act like strtol() with a base of 0.
The behaviour does not seem to be consistent though (note i cannot test Solaris because i have messed my OpenCSW account, grmpf!):
E.g., the program below on a 64-bit musl Linux:

  ?0[steffen@essex tmp]$ tcc -run t.c
  0xFFFFFFFFFFFFFF7F: 0xFFFFFFFFFFFFFF7F
  ?0[steffen@essex tmp]$ tcc -run t.c 1
  0xFFFFFFFFFFFFFF7F: 0x7FFFFFFFFFFFFFFF (34: Result not representable)

On a 32-bit OpenBSD:

  ?0[steffen@obsd tmp]$ ./zt
  0xFFFFFFFFFFFFFF7F: 0x7FFFFFFFFFFFFFFF
  ?0[steffen@obsd tmp]$ ./zt 1
  0xFFFFFFFFFFFFFF7F: 0x7FFFFFFFFFFFFFFF (34: Result too large)

But, adjusted to work with long/strtol() instead we get where i want to be!

  ?0[steffen@obsd tmp]$ ./zt 1
  0xFFFFFF7F: 0x7FFFFFFF (34: Result too large)
  ?0[steffen@obsd tmp]$ ./zt
  0xFFFFFF7F: 0xFFFFFF7F

As can be seen unsigned parse mode seems to become used for %i on multiple systems if the number uses a power-of-two base with known prefix.
The program is:

#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int
main(int argc, char **argv){
        char buf[32];
        int e;
        long long l;
        l = -129;
        snprintf(buf, sizeof buf, "0x%llX", l);
        printf("%s: ", buf);
        if(argc > 1){
                errno = 0;
                l = strtoll(buf, NULL, 0);
                e = errno;
                printf("0x%llX (%d: %s)\n", l, e, strerror(e));
        }else{
                sscanf(buf, "%lli", &l);
                printf("0x%llX\n", l);
        }
        return 0;
}
Desired Action Allow automatic treatment of power-of-two bases with known standardized prefix as unsigned for *scanf() %i and strtol*().
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0003908)
shware_systems (reporter)
2018-01-05 07:36

Per C11 7.21.6.2p10 (fscanf()) such usage is undefined behavior, as an error demoting unsigned of larger type rank to signed.
Because of the related clauses in Section 6, due to the strto* references in the text, I think this is an issue to be brought up to the C working group, not something for POSIX to address unilaterally and still defer to C.
(0003910)
Vincent Lefevre (reporter)
2018-01-08 16:32

The description is incorrect. 0xFFFFFFFF is not -1. It is a positive integer. From a user point of view, 0xFFFFFFFF may be regarded as equivalent to -1 for 32-bit integers, but this would be possible only for two's complement, while in C, integers can have other representations. Now, POSIX requires two's complement, so that it could choose to define some C's undefined behaviors; this may not be a good idea, though, since this may introduce inconsistencies with future C standards. There would be another drawback: some optimizations based on UB would no longer be possible. A solution could be to standardize GCC's -fwrapv option, but I don't think it is worth. Note: this applies to operations like +, - and *. Not sure about scanf(), but as a user, if I enter a too large positive integer, I would prefer an error rather than seeing it as a negative integer!
(0003911)
steffen (reporter)
2018-01-09 18:43

Sorry for the delay.
Re #3908:
i do not think that 7.21.6.2p10 really applies here.
strtol(), yes.

Re #3910:
I am absolutely fine with stating that all bits set is a positive integer.

In the meanwhile i have tested Solaris and the test program yields the same a.k.a. the desired results on SunOS 5.9 (2003), SunOS 5.10 (2005) and SunOS 5.11 (2015) as well.
Thanks to OpenCSW for fixing my account!

I think if all platforms POSIX cares about yield the desired result and automatically choose unsigned parse mode when they see hexadecimal, which is always produced as unsigned by e.g. %X, the standard has to follow and turn this into something printed.

It is a pity that strtol()
(0003912)
steffen (reporter)
2018-01-09 18:51

hrm. sorry. at least entering an URL the first time can be changed not to go over a search engine.

It is a pity that strtol() cannot be adjusted, too. At least binary and octal bases would benefit from automatic switch to unsigned mode if the user base is 0, too. (And only then, not when an explicit base was given, since in this case the programmer seems to know what is expected and can very well choose strtoul() as necessary.)
(0003913)
shware_systems (reporter)
2018-01-09 23:00

Re: 3911:
It is not that the implementations switch to unsigned mode - they are required to start in that mode, whatever the base, and if the interface in question applies to a signed type, the optional sign character is applied afterwards, to be in keeping with the syntax of C11 6.4.4.1 and constant expressions. The explicit mention of sign chars being allowed is because otherwise they're unary operators, syntactically, not part of the constant. This holds for pp-nums as well as float and integer constants, and at runtime it doesn't matter if it's sscanf() %ioux or strto*() doing the converting.

As a runtime expression unsigned long X = -0xFFFFFFFF; gets typed to unsigned long first, then to long long int so the sign can be applied without the result being out of range, and then cast back to unsigned long in the assignment by demotion using repeated addition per 6.3.1.3p2.

By p3 long X = -0xFFFFFFFF; starts as unsigned long, casts to long long, then should abort the app with a demotion error due to the range, which sscanf() considers undefined, as stricter than implementation-defined, because when multiple operands are present it's ambiguous which might produce an error return code. The phrasing of strto*() is so they can return ERANGE rather than the otherwise required compiler provided diagnostic for that error condition, and this error only applies to the provided argument.

While a compiler can provide a diagnostics suppressing #pragma, I don't see that this applies to library code. The examples don't show those messages, however, just the printf()s, so those implementations degree of conformance is suspect, imo.

- Issue History
Date Modified Username Field Change
2017-12-29 13:52 steffen New Issue
2017-12-29 13:52 steffen Name => steffen
2017-12-29 13:52 steffen Section => sscanf
2017-12-29 13:52 steffen Page Number => 951
2017-12-29 13:52 steffen Line Number => 32310 ff.
2018-01-05 07:36 shware_systems Note Added: 0003908
2018-01-08 16:32 Vincent Lefevre Note Added: 0003910
2018-01-09 18:43 steffen Note Added: 0003911
2018-01-09 18:51 steffen Note Added: 0003912
2018-01-09 23:00 shware_systems Note Added: 0003913


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker