View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000872 | 1003.1(2013)/Issue7+TC1 | Base Definitions and Headers | public | 2014-08-27 16:16 | 2019-06-10 08:54 |
Reporter | nsz | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | Closed | Resolution | Accepted As Marked | ||
Name | Szabolcs Nagy | ||||
Organization | musl libc | ||||
User Reference | |||||
Section | 9.3.5 RE Bracket Expression | ||||
Page Number | 184 | ||||
Line Number | 5968-5970 | ||||
Interp Status | --- | ||||
Final Accepted Text | 0000872:0002415 | ||||
Summary | 0000872: REG_ICASE regex matching and negated bracket expr | ||||
Description | In chapter 9 the case insensitive matching of negated (^) bracket expressions is inconsistent with historical practice. (1) Case insensitive matching according to section 9.2 "Regular Expression General Requirements": "when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched." (2) Rule 3. in 9.3.5 "RE Bracket Expression": "A non-matching list expression begins with a <circumflex> ( '^' ), and specifies a list that shall match any single-character collating element except for the expressions represented in the list after the leading <circumflex>." these two rules together mean that [^a] should match 'a' and 'A' with REG_ICASE, because using (1) both 'a' and 'A' should be tried when matching either of them against the bracket expr and 'A' does match [^a] according to (2). on historical implementations [^a] does not match 'a' nor 'A' with REG_ICASE | ||||
Desired Action | change "A non-matching list expression begins with a <circumflex> ( '^' ), and specifies a list that shall match any single-character collating element except for the expressions represented in the list after the leading <circumflex>." to "A non-matching list expression begins with a <circumflex> ( '^' ), and specifies a list that shall match any single-character collating element except for the ones that match the expressions represented in the list after the leading <circumflex>. Matching the expressions in the list is done without regard to the case when the regular expression is matched case-insensitively." | ||||
Tags | tc2-2008 |
related to | 0000938 | Closed | Collation issues in XBD (changes for TC2) |
|
I noticed that the regcomp rationale says: The REG_ICASE flag supports the operations taken by the grep -i option and the historical implementations of ex and vi. Including this flag will make it easier for application code to be written that does the same thing as these utilities. none of the original grep -i, ex, vi (with :set ignorecase) follow the current posix definition of REG_ICASE (they don't match [^a] to a or A) |
|
They aren't supposed to match, with the negate; they're supposed to return match found for 'b' or 'B', etc. If anything those implementations were probably written without REG_ICASE support and not updated, if they are returning match found for '[^a]' tested against 'A' with REG_ICASE specified. The desired action does emphasize to implementers REG_ICASE needs to be accounted for in evaluating '^', but is nominally superfluous in my opinion. In Section 9.2, I think it is less ambiguously expressed by changing: "when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched." to: "when a character in the string is tested against the pattern, not only the character, but also its case counterparts (if any), shall be tested, and a match occurs if one of them fit the test criteria." This emphasizes match status is determined after the relevant testing, not presumed true and possibly negated as it can be read now. Note 'counterpart' pluralized, as preliminary ground work for changes required to adequately support Unicode's extra casing classifications. Not adding more, as that's for a separate report as an Issue 8 matter, but I feel it doesn't change the intent of that section for Issue 7. |
|
On page 184 lines 5968-5970 (XBD 9.3.5 RE Bracket Expression), change:A non-matching list expression begins with a <circumflex> ('^'), and specifies a list that shall match any single-character collating element except for the expressions represented in the list after the leading <circumflex>. to: A non-matching list expression begins with a <circumflex> ('^'), and the matching behavior shall be the logical inverse of the corresponding matching list expression (the same bracket expression but without the leading <circumflex>). |
Date Modified | Username | Field | Change |
---|---|---|---|
2014-08-27 16:16 | nsz | New Issue | |
2014-08-27 16:16 | nsz | Name | => Szabolcs Nagy |
2014-08-27 16:16 | nsz | Organization | => musl libc |
2014-08-27 16:16 | nsz | Section | => 9.3.5 RE Bracket Expression |
2014-08-27 16:16 | nsz | Page Number | => - |
2014-08-27 16:16 | nsz | Line Number | => - |
2014-09-23 18:52 | nsz | Note Added: 0002396 | |
2014-10-09 14:22 | shware_systems | Note Added: 0002413 | |
2014-10-09 15:27 | rhansen | Page Number | - => 184 |
2014-10-09 15:27 | rhansen | Line Number | - => 5968-5970 |
2014-10-09 15:27 | rhansen | Interp Status | => --- |
2014-10-09 15:53 | rhansen | Note Added: 0002415 | |
2014-10-09 15:56 | rhansen | Final Accepted Text | => 0000872:0002415 |
2014-10-09 15:56 | rhansen | Status | New => Resolved |
2014-10-09 15:56 | rhansen | Resolution | Open => Accepted As Marked |
2014-10-09 15:57 | rhansen | Tag Attached: tc2-2008 | |
2015-06-04 16:25 | eblake | Relationship added | related to 0000938 |
2019-06-10 08:54 | agadmin | Status | Resolved => Closed |