View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001078 | 1003.1(2013)/Issue7+TC1 | System Interfaces | public | 2016-09-16 17:54 | 2024-06-11 08:54 |
Reporter | Florian Weimer | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | Closed | Resolution | Accepted As Marked | ||
Name | Florian Weimer | ||||
Organization | Red Hat | ||||
User Reference | |||||
Section | isdigt, isxdigit | ||||
Page Number | unknown | ||||
Line Number | unknown | ||||
Interp Status | Approved | ||||
Final Accepted Text | 0001078:0003946 | ||||
Summary | 0001078: isdigit, isxdigit locale dependance | ||||
Description | ISO C99 and C11 are very explicit about the set of decimal digits and hexadecimal digits (0123456789 and 0123456789ABCDEFabcdef). There is no expectation that the result of these functions is locale-dependent, unlike isalpha (for example). Any locale-dependence of the results would thus violate C99/C11 semantics. isalnum is affected by this as well because it is specified as the union of isalpha and isdigit in IS C. This is probably a defect in ISO C. | ||||
Desired Action | “The isdigit function returns true if the argument refers to a character in the range from '0' to '9' (inclusive.” “The isxdigit function returns true if the argument is a decimal digit according to the isdigit function, or if the argument refers to one of the characters 'A', 'B', 'C', 'D', 'E', 'F', 'a', 'b', 'c', 'd', 'e', 'f'.” If so desired, any restrictions on the digit and xdigit character classes could be lifted, and applications could query them using the isdigit_l and isxdigit_l functions. | ||||
Tags | tc3-2008 |
|
The characters included in character classes digit and digit do not vary from one locale to another, but the codeset used does vary by locale. A locale based on an EBCDIC codeset; a locale based on an ASCII, an ISO 8859-*, or a UTF-8 codeset; a locale based on a UTF-16 codeset; and a locale based on a UTF-32 codeset all have different byte encodings for the characters in both of these character classes. Therefore, the locale is required by all of these functions to determine the byte encoding of the characters in the class. |
|
In the past, I recall the analysis that it is not possible to have a system with simultaneous EBCDIC and ASCII-derived locales (you can have a system with support for both encodings only if there is some non-standard way to switch which mode you are in, but within a given mode, standard-conforming apps see either that all available locales are EBCDIC-based, or that all available locales are ASCII-based). In part, this is because the standard requires that all locales use the same single-byte representation for the various digits; because isdigit() really is locale-independent, and returns non-zero for the same set of 10 bytes regardless of what else is going on in the locale. The standard forbids a locale with UTF-16 or UTF-32 codesets (the all-zero NUL byte is required to represent the NUL character across ALL locales, and is not permitted to appear within a multi-byte character encoding). That does not mean that the standard forbids processing of UTF-16 or UTF-32 data (that would be a job for iconv) but merely that your locale is never encoded as UTF-16 or UTF-32. Furthermore, while the standard does permit multibyte encodings where one byte of a multi-byte character happens to also have the same value as a single-byte letter character, it is fairly explicit that at least the portable filename character set (which includes all of the characters in question by isxdigit()) must be single-byte characters. So about the only reason the locale could even play a role is in determining whether you have a locale where some multibyte character encodings happen to reuse a byte that can also be one of the characters in question if it appears on its own; but not whether the characters in question can have a different single-byte encoding. |
|
Interpretation response ------------------------ The standard states the requirements for isdigit() and isxdigit(), and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- Allowing additional characters in the digit and xdigit classes is a conflict with the requirements of ISO C for isdigit() and isxdigit(). Notes to the Editor (not part of this interpretation): ------------------------------------------------------- (Page and line numbers are for the 2016 edition) On page 139 line 4123 delete: These only need to be specified if the character values (that is, encoding) differ from the implementation default values. At page 140 line 4152, change: In the POSIX locale, only:to: In all locales, only: On page 141 line 4195, change: In the POSIX locale, only:to: In all locales, only: On page 141 at Line 4198, change: In a locale definition file, only the characters defined for the class digit shall be specified, in contiguous ascending sequence by numerical value, followed by one or more sets of six characters representing the hexadecimal digits 10 to 15 inclusive, with each set in ascending order (for example, <A>, < B>, <C>, <D>, <E>, <F>, <a>, < b>, <c>, <d>, <e>, <f>). The digits <zero> to <nine>, the uppercase letters <A> to <F>, and the lowercase letters <a> to <f> of the portable character set are automatically included in this class.to: In a locale definition file, only the characters defined for the class digit shall be specified, in contiguous ascending sequence by numerical value, followed by two sets, in either order, of six characters representing the hexadecimal digits corresponding to the decimal numbers 10 to 15 inclusive, with each set in ascending order: <A>, < B>, <C>, <D>, <E>, <F> and <a>, < b>, <c>, <d>, <e>, <f>. The digits <zero> to <nine>, the uppercase letters <A> to <F>, and the lowercase letters <a> to <f> of the portable character set are automatically included in this class. (Note that the space in < b> and < B> should be omitted; it is there to prevent Mantis from interpreting it as a bold specifier.) |
|
Interpretation Proposed: 30 September 2018 |
|
Interpretation approved: 12 November 2018 |
Date Modified | Username | Field | Change |
---|---|---|---|
2016-09-16 17:54 | Florian Weimer | New Issue | |
2016-09-16 17:54 | Florian Weimer | Name | => Florian Weimer |
2016-09-16 17:54 | Florian Weimer | Organization | => Red Hat |
2016-09-16 17:54 | Florian Weimer | Section | => isdigt, isxdigit |
2016-09-16 17:54 | Florian Weimer | Page Number | => unknown |
2016-09-16 17:54 | Florian Weimer | Line Number | => unknown |
2016-09-16 18:35 | Don Cragun | Note Added: 0003380 | |
2016-09-16 18:50 | eblake | Note Added: 0003381 | |
2018-04-05 16:42 | geoffclare | Note Added: 0003946 | |
2018-04-05 16:44 | geoffclare | Note Edited: 0003946 | |
2018-04-05 16:44 | geoffclare | Note Edited: 0003946 | |
2018-04-05 16:46 | geoffclare | Note Edited: 0003946 | |
2018-04-05 16:48 | geoffclare | Interp Status | => Pending |
2018-04-05 16:48 | geoffclare | Final Accepted Text | => 0001078:0003946 |
2018-04-05 16:48 | geoffclare | Status | New => Interpretation Required |
2018-04-05 16:48 | geoffclare | Resolution | Open => Accepted As Marked |
2018-04-05 16:48 | geoffclare | Tag Attached: tc3-2008 | |
2018-09-30 18:41 | ajosey | Interp Status | Pending => Proposed |
2018-09-30 18:41 | ajosey | Note Added: 0004141 | |
2018-11-12 19:47 | ajosey | Interp Status | Proposed => Approved |
2018-11-12 19:47 | ajosey | Note Added: 0004163 | |
2019-10-28 10:40 | geoffclare | Status | Interpretation Required => Applied |
2024-06-11 08:54 | agadmin | Status | Applied => Closed |