0001078: isdigit, isxdigit locale dependance

ID	Project	Category	View Status	Date Submitted	Last Update

0001078	1003.1(2013)/Issue7+TC1	System Interfaces	public	2016-09-16 17:54	2024-06-11 08:54

Reporter	Florian Weimer	Assigned To
Priority	normal	Severity	Editorial	Type	Clarification Requested
Status	Closed	Resolution	Accepted As Marked

Name	Florian Weimer
Organization	Red Hat
User Reference
Section	isdigt, isxdigit
Page Number	unknown
Line Number	unknown
Interp Status	Approved
Final Accepted Text	0001078:0003946


Summary	0001078: isdigit, isxdigit locale dependance
Description	ISO C99 and C11 are very explicit about the set of decimal digits and hexadecimal digits (0123456789 and 0123456789ABCDEFabcdef). There is no expectation that the result of these functions is locale-dependent, unlike isalpha (for example). Any locale-dependence of the results would thus violate C99/C11 semantics. isalnum is affected by this as well because it is specified as the union of isalpha and isdigit in IS C. This is probably a defect in ISO C.
Desired Action	“The isdigit function returns true if the argument refers to a character in the range from '0' to '9' (inclusive.” “The isxdigit function returns true if the argument is a decimal digit according to the isdigit function, or if the argument refers to one of the characters 'A', 'B', 'C', 'D', 'E', 'F', 'a', 'b', 'c', 'd', 'e', 'f'.” If so desired, any restrictions on the digit and xdigit character classes could be lifted, and applications could query them using the isdigit_l and isxdigit_l functions.
Tags	tc3-2008

~~Don Cragun~~ 2016-09-16 18:35 viewer bugnote:0003380	The characters included in character classes digit and digit do not vary from one locale to another, but the codeset used does vary by locale. A locale based on an EBCDIC codeset; a locale based on an ASCII, an ISO 8859-*, or a UTF-8 codeset; a locale based on a UTF-16 codeset; and a locale based on a UTF-32 codeset all have different byte encodings for the characters in both of these character classes. Therefore, the locale is required by all of these functions to determine the byte encoding of the characters in the class.

eblake 2016-09-16 18:50 manager bugnote:0003381	In the past, I recall the analysis that it is not possible to have a system with simultaneous EBCDIC and ASCII-derived locales (you can have a system with support for both encodings only if there is some non-standard way to switch which mode you are in, but within a given mode, standard-conforming apps see either that all available locales are EBCDIC-based, or that all available locales are ASCII-based). In part, this is because the standard requires that all locales use the same single-byte representation for the various digits; because isdigit() really is locale-independent, and returns non-zero for the same set of 10 bytes regardless of what else is going on in the locale. The standard forbids a locale with UTF-16 or UTF-32 codesets (the all-zero NUL byte is required to represent the NUL character across ALL locales, and is not permitted to appear within a multi-byte character encoding). That does not mean that the standard forbids processing of UTF-16 or UTF-32 data (that would be a job for iconv) but merely that your locale is never encoded as UTF-16 or UTF-32. Furthermore, while the standard does permit multibyte encodings where one byte of a multi-byte character happens to also have the same value as a single-byte letter character, it is fairly explicit that at least the portable filename character set (which includes all of the characters in question by isxdigit()) must be single-byte characters. So about the only reason the locale could even play a role is in determining whether you have a locale where some multibyte character encodings happen to reuse a byte that can also be one of the characters in question if it appears on its own; but not whether the characters in question can have a different single-byte encoding.

geoffclare 2018-04-05 16:42 reporter bugnote:0003946 Last edited: 2018-04-05 16:46	Interpretation response ------------------------ The standard states the requirements for isdigit() and isxdigit(), and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- Allowing additional characters in the digit and xdigit classes is a conflict with the requirements of ISO C for isdigit() and isxdigit(). Notes to the Editor (not part of this interpretation): ------------------------------------------------------- (Page and line numbers are for the 2016 edition) On page 139 line 4123 delete: These only need to be specified if the character values (that is, encoding) differ from the implementation default values. At page 140 line 4152, change: In the POSIX locale, only: to: In all locales, only: On page 141 line 4195, change: In the POSIX locale, only: to: In all locales, only: On page 141 at Line 4198, change: In a locale definition file, only the characters defined for the class digit shall be specified, in contiguous ascending sequence by numerical value, followed by one or more sets of six characters representing the hexadecimal digits 10 to 15 inclusive, with each set in ascending order (for example, <A>, < B>, <C>, <D>, <E>, <F>, <a>, < b>, <c>, <d>, <e>, <f>). The digits <zero> to <nine>, the uppercase letters <A> to <F>, and the lowercase letters <a> to <f> of the portable character set are automatically included in this class. to: In a locale definition file, only the characters defined for the class digit shall be specified, in contiguous ascending sequence by numerical value, followed by two sets, in either order, of six characters representing the hexadecimal digits corresponding to the decimal numbers 10 to 15 inclusive, with each set in ascending order: <A>, < B>, <C>, <D>, <E>, <F> and <a>, < b>, <c>, <d>, <e>, <f>. The digits <zero> to <nine>, the uppercase letters <A> to <F>, and the lowercase letters <a> to <f> of the portable character set are automatically included in this class. (Note that the space in < b> and < B> should be omitted; it is there to prevent Mantis from interpreting it as a bold specifier.)

ajosey 2018-09-30 18:41 manager bugnote:0004141	Interpretation Proposed: 30 September 2018

ajosey 2018-11-12 19:47 manager bugnote:0004163	Interpretation approved: 12 November 2018

Date Modified	Username	Field	Change
2016-09-16 17:54	Florian Weimer	New Issue
2016-09-16 17:54	Florian Weimer	Name	=> Florian Weimer
2016-09-16 17:54	Florian Weimer	Organization	=> Red Hat
2016-09-16 17:54	Florian Weimer	Section	=> isdigt, isxdigit
2016-09-16 17:54	Florian Weimer	Page Number	=> unknown
2016-09-16 17:54	Florian Weimer	Line Number	=> unknown
2016-09-16 18:35	~~Don Cragun~~	Note Added: 0003380
2016-09-16 18:50	eblake	Note Added: 0003381
2018-04-05 16:42	geoffclare	Note Added: 0003946
2018-04-05 16:44	geoffclare	Note Edited: 0003946
2018-04-05 16:44	geoffclare	Note Edited: 0003946
2018-04-05 16:46	geoffclare	Note Edited: 0003946
2018-04-05 16:48	geoffclare	Interp Status	=> Pending
2018-04-05 16:48	geoffclare	Final Accepted Text	=> 0001078:0003946
2018-04-05 16:48	geoffclare	Status	New => Interpretation Required
2018-04-05 16:48	geoffclare	Resolution	Open => Accepted As Marked
2018-04-05 16:48	geoffclare	Tag Attached: tc3-2008
2018-09-30 18:41	ajosey	Interp Status	Pending => Proposed
2018-09-30 18:41	ajosey	Note Added: 0004141
2018-11-12 19:47	ajosey	Interp Status	Proposed => Approved
2018-11-12 19:47	ajosey	Note Added: 0004163
2019-10-28 10:40	geoffclare	Status	Interpretation Required => Applied
2024-06-11 08:54	agadmin	Status	Applied => Closed

View Issue Details

Activities

Issue History