Anonymous | Login | 2024-10-14 23:11 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001477 | [1003.1(2016/18)/Issue7+TC2] Base Definitions and Headers | Objection | Error | 2021-05-20 11:16 | 2024-06-11 09:07 | ||
Reporter | geoffclare | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted | ||||
Status | Closed | ||||||
Name | Geoff Clare | ||||||
Organization | The Open Group | ||||||
User Reference | |||||||
Section | 7.1, 8.2 | ||||||
Page Number | 135, 176 | ||||||
Line Number | 3967, 5768 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | |||||||
Summary | 0001477: Consequences of specifying locale categories with different charsets | ||||||
Description |
XBD 7.1 says:If different character sets are used by the locale categories, the results achieved by an application utilizing these categories are undefined. Likewise, if different codesets are used for the data being processed by interfaces whose behavior is dependent on the current locale, or the codeset is different from the codeset assumed when the locale was created, the result is also undefined. XBD 8.2 says: If these variables specify locale categories that are not based upon the same underlying codeset, the results are unspecified. There are several problems with these statements: 1. They say different things. I suggest the statement in 8.2 should be replaced by a cross-reference to 7.1, since the latter is more precise. 2. The second sentence of the 7.1 statement contradicts requirements elsewhere, e.g. it says the behaviour is undefined in situations where individual function descriptions require them to return EILSEQ errors. It's also not clear what the point of referring to "the codeset assumed when the locale was created" is. 3. The statement in 7.1 uses "current locale" but should be inclusive of the *_l() functions. 4. Both statements are overly restrictive. If I set: LANG=en_US.utf8 LC_TIME=POSIX on a system where the codeset for the POSIX locale is ASCII and the codeset for en_US.utf8 is UTF-8, all of the characters used in the LC_TIME locale data exist, with the same encoding, in the codeset used for LC_CTYPE (via LANG), so there is no reason for the behaviour to be unspecified/undefined. |
||||||
Desired Action |
On page 135 line 3967 section 7.1, change:If different character sets are used by the locale categories, the results achieved by an application utilizing these categories are undefined. Likewise, if different codesets are used for the data being processed by interfaces whose behavior is dependent on the current locale, or the codeset is different from the codeset assumed when the locale was created, the result is also undefined.to: If incompatible character sets are used by the locale categories, the results achieved by an application utilizing these categories are undefined. Two locale categories have incompatible character sets if one of the categories is LC_CTYPE and the locale data associated with the other category includes at least one character that either is not in the character set used by LC_CTYPE or has a different encoding than the same character in the character set used by LC_CTYPE. On page 176 line 5768 section 8.2, change: If these variables specify locale categories that are not based upon the same underlying codeset, the results are unspecified.to: See [xref to Section 7.1] for the consequences of setting these variables to locales with different character sets. Move the following paragraph from page 3537 line 119926 section A.8.2 to after page 3525 line 119377 section A.7.1: The locale settings of individual categories cannot be truly independent and still guarantee correct results. For example, when collating two strings, characters must first be extracted from each string (governed by LC_CTYPE) before being mapped to collating elements (governed by LC_COLLATE) for comparison. That is, if LC_CTYPE is causing parsing according to the rules of a large, multi-byte code set (potentially returning 20 000 or more distinct character codeset values), but LC_COLLATE is set to handle only an 8-bit codeset with 256 distinct characters, meaningful results are obviously impossible.and add the following new paragraph after it: Earlier versions of this standard stated that if different character sets are used by the locale categories, the results achieved by an application utilizing these categories are undefined. This was felt to be overly restrictive. For example, when setting:LANG=en_US.utf8 LC_TIME=POSIXon a system where the codeset for the POSIX locale is ASCII and the codeset for en_US.utf8 is UTF-8, all of the characters used in the LC_TIME locale data exist, with the same encoding, in the codeset used for LC_CTYPE (via LANG), so there is no reason for the behavior to be undefined in this case. This standard now has more precise requirements in this area. |
||||||
Tags | tc3-2008 | ||||||
Attached Files | |||||||
|
There are no notes attached to this issue. |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |