Anonymous | Login | 2024-10-09 03:59 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000938 | [1003.1(2013)/Issue7+TC1] Base Definitions and Headers | Objection | Error | 2015-04-22 15:56 | 2019-06-10 08:54 | ||
Reporter | geoffclare | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted | ||||
Status | Closed | ||||||
Name | Geoff Clare | ||||||
Organization | The Open Group | ||||||
User Reference | |||||||
Section | 7.3.2, 9.3.5 | ||||||
Page Number | 147, 150-151, 184 | ||||||
Line Number | 4366, 4503, 5944, and more | ||||||
Interp Status | --- | ||||||
Final Accepted Text | |||||||
Summary | 0000938: Collation issues in XBD (changes for TC2) | ||||||
Description |
A discussion on the mailing list identified some issues related to collation for locales that do not define a collation sequence with a total ordering of all characters. It is proposed that these issues are addressed in Issue 8 by requiring implementation-provided locales that do not have an '@' modifier in their name to define a collation sequence that has a total ordering of all characters (thus reducing the problem to "special" locales and user-defined locales), and by modifying the requirements for regular expressions and affected utilities so that they cope better with such locales. As an intermediate step, it is proposed that the new requirements slated for Issue 8 are recommended (or at least allowed) in TC2. The necessary changes will be split across four Mantis bugs, targeting XBD TC2, XCU TC2, XBD Issue 8, and XCU Issue 8. This bug contains the changes proposed for XBD in TC2. |
||||||
Desired Action |
On Page: 147 Line: 4366 Section: 7.3.2 LC_COLLATE Change from: (sort, uniq, and so on) to: (ls, sort, and so on) On Page: 147 Line: 4393 Section: 7.3.2 LC_COLLATE Add a new paragraph and two small-font notes after the numbered list: All implementation-provided locales (either preinstalled or provided as locale definitions which can be installed later) should define a collation sequence that has a total ordering of all characters unless the locale name has an '@' modifier indicating that it has a special collation sequence (for example, <tt>@icase</tt> could indicate that each upper- and lower-case character pair collates equally). <small>Note: a future version of this standard may require these locales to define a collation sequence that has a total ordering of all characters (by changing "should" to "shall").</small> <small>Note: users installing their own locales should ensure that they define a collation sequence with a total ordering of all characters unless an '@' modifier in the locale name (such as @icase) indicates that it has a special collation sequence.</small> On Page: 150 Line: 4503 Section: 7.3.2.4 Collation Order Add a new paragraph and a small-font note: Weights should be assigned such that the collation sequence has a total ordering of all characters unless an '@' modifier in the locale name indicates that it has a special collation sequence. <small>Note: a future version of this standard may require a total ordering of all characters for implementation-provided locales that do not have an '@' modifier in the locale name. See [xref to 7.3.2].</small> On Page: 150 Line: 4517 Section: 7.3.2.4 Collation Order Change from: Characters specified via an explicit or implicit UNDEFINED special symbol shall by default be assigned the same primary weight (that is, they belong to the same equivalence class). to: Characters specified via an explicit or implicit UNDEFINED special symbol shall by default be assigned the same primary weight (that is, they belong to the same equivalence class) if the collation order has more than one weight level. If the collation order has only one weight level, these characters should be assigned unique primary weights, equal to the relative order of their character in the character collation sequence, but may be assigned the same primary weight. <small>Note: a future version of this standard may require these characters to be assigned unique primary weights if the collation order has only one weight level.</small> On Page: 151 Line: 4539 Section: 7.3.2.4 Collation Order Delete the first entry in the example collation order: <tt>UNDEFINED IGNORE;IGNORE</tt> On Page: 151 Line: 4552 Section: 7.3.2.4 Collation Order Add the following as the last entry in the example collation order: <tt>UNDEFINED IGNORE;...</tt> On Page: 151 Line: 4555 Section: 7.3.2.4 Collation Order Delete item 1 in the numbered list: The UNDEFINED means that all characters not specified in this definition (explicitly or via the ellipsis) shall be ignored for collation purposes. and renumber items 2-4 to be 1-3. On Page: 151 Line: 4563 Section: 7.3.2.4 Collation Order Add a new item 4 to the numbered list: The UNDEFINED means that all characters not specified in this definition (explicitly or via the ellipsis) shall be ignored when comparing primary weights, and have individual secondary weights based on their ordinal encoded values. On Page: 184 Line: 5944 Section: 9.3.5 RE Bracket Expression Change from: A bracket expression (an expression enclosed in square brackets, "[ ]") is an RE that shall match a single collating element contained in the non-empty set of collating elements represented by the bracket expression. to: A bracket expression (an expression enclosed in square brackets, "[ ]") is an RE that shall match a specific set of single characters, and may match a specific set of multi-character collating elements, based on the non-empty set of list expressions contained in the bracket expression. On Page: 184 Line: 5949 Section: 9.3.5 RE Bracket Expression In list item 1, change from: It consists of one or more expressions: collating elements, collating symbols, ... to: It consists of one or more expressions: ordinary characters, collating elements, collating symbols, ... On Page: 184 Line: 5963 Section: 9.3.5 RE Bracket Expression In list item 2, change from: A matching list expression specifies a list that shall match any single-character collating element in any of the expressions represented in the list. The first character in the list shall not be the <circumflex>; for example, "[abc]" is an RE that matches any of the characters 'a', 'b', or 'c'. to: A matching list expression specifies a list that shall match any single character that is matched by one of the expressions represented in the list. The first character in the list can not be the <circumflex>. An ordinary character in the list should only match that character, but may match any single character that collates equally with that character; for example, "[abc]" is an RE that should only match one of the characters 'a', 'b', or 'c'. <small>Note: a future version of this standard may require that an ordinary character in the list only matches that character.</small> On Page: 184 Line: 5970 Section: 9.3.5 RE Bracket Expression In list item 3, change from: For example, "[^abc]" is an RE that matches any character except the characters 'a', 'b', or 'c'. to: For example, if the RE "[abc]" only matches 'a', 'b', or 'c', then "[^abc]" is an RE that matches any character except 'a', 'b', or 'c'. (Note that no change is needed to the first sentence in list item 3 because it is already modified suitably by 0000872.) On Page: 185 Line: 5995 Section: 9.3.5 RE Bracket Expression In list item 6, change from: The set of single-character collating elements whose characters belong to the character class to: The set of single characters that belong to the character class Cross-volume changes to XRAT ... On Page: 3490 Line: 117820 Section: A.7.3.2 LC_COLLATE Add a new paragraph: This standard recommends (by the use of "should" in the normative text) that all implementation-provided locales define a collation sequence that has a total ordering of all characters unless the locale name has an '@' modifier indicating that it has a special collation sequence. Defining locales in this way eliminates unexpected behavior when non-identical strings can collate equally (for example, <tt>sort -u</tt> and <tt>sort | uniq</tt> are not equivalent). The exception for locales with a suitable '@' modifier in the name allows implementations to supply locales which do not have a total ordering of all characters provided that they draw attention to it in the modifier name. For example, <tt>@icase</tt> could indicate that each upper- and lower-case character pair collates equally. Even with an '@' modifier, total ordering is preferred when possible; for example, characters that are "ignored" in dictionary order need not be completely ignored (by using IGNORE for all collation weights), but can instead by given a unique weight after one or more IGNORE weights. |
||||||
Tags | tc2-2008, UTF-8_Locale | ||||||
Attached Files | |||||||
|
Relationships | ||||||||||||||||
|
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |