View Issue Details

IDProjectCategoryView StatusLast Update
00008721003.1(2013)/Issue7+TC1Base Definitions and Headerspublic2019-06-10 08:54
Reporternsz Assigned To 
PrioritynormalSeverityEditorialTypeClarification Requested
Status ClosedResolutionAccepted As Marked 
NameSzabolcs Nagy
Organizationmusl libc
User Reference
Section9.3.5 RE Bracket Expression
Page Number184
Line Number5968-5970
Interp Status---
Final Accepted Text0000872:0002415
Summary0000872: REG_ICASE regex matching and negated bracket expr
DescriptionIn chapter 9 the case insensitive matching of negated (^) bracket
expressions is inconsistent with historical practice.

(1) Case insensitive matching according to section 9.2 "Regular
Expression General Requirements":

 "when each character in the string is matched against the pattern, not
 only the character, but also its case counterpart (if any), shall be
 matched."

(2) Rule 3. in 9.3.5 "RE Bracket Expression":

 "A non-matching list expression begins with a <circumflex> ( '^' ), and
 specifies a list that shall match any single-character collating
 element except for the expressions represented in the list after the
 leading <circumflex>."


these two rules together mean that [^a] should match 'a' and 'A' with
REG_ICASE, because using (1) both 'a' and 'A' should be tried when
matching either of them against the bracket expr and 'A' does match [^a]
according to (2).

on historical implementations [^a] does not match 'a' nor 'A' with
REG_ICASE
Desired Actionchange

 "A non-matching list expression begins with a <circumflex> ( '^' ), and
 specifies a list that shall match any single-character collating element
 except for the expressions represented in the list after the leading
 <circumflex>."

to

 "A non-matching list expression begins with a <circumflex> ( '^' ), and
 specifies a list that shall match any single-character collating element
 except for the ones that match the expressions represented in the list
 after the leading <circumflex>. Matching the expressions in the list is
 done without regard to the case when the regular expression is matched
 case-insensitively."
Tagstc2-2008

Relationships

related to 0000938 Closed Collation issues in XBD (changes for TC2) 

Activities

nsz

2014-09-23 18:52

reporter   bugnote:0002396

I noticed that the regcomp rationale says:

  The REG_ICASE flag supports the operations taken by the grep -i
  option and the historical implementations of ex and vi. Including
  this flag will make it easier for application code to be written
  that does the same thing as these utilities.

none of the original grep -i, ex, vi (with :set ignorecase)
follow the current posix definition of REG_ICASE (they don't
match [^a] to a or A)

shware_systems

2014-10-09 14:22

reporter   bugnote:0002413

They aren't supposed to match, with the negate; they're supposed to return match found for 'b' or 'B', etc. If anything those implementations were probably written without REG_ICASE support and not updated, if they are returning match found for '[^a]' tested against 'A' with REG_ICASE specified.

The desired action does emphasize to implementers REG_ICASE needs to be accounted for in evaluating '^', but is nominally superfluous in my opinion.

In Section 9.2, I think it is less ambiguously expressed by changing:

 "when each character in the string is matched against the pattern, not
 only the character, but also its case counterpart (if any), shall be
 matched."

to:

 "when a character in the string is tested against the pattern, not
 only the character, but also its case counterparts (if any), shall be
 tested, and a match occurs if one of them fit the test criteria."

This emphasizes match status is determined after the relevant testing, not presumed true and possibly negated as it can be read now. Note 'counterpart' pluralized, as preliminary ground work for changes required to adequately support Unicode's extra casing classifications. Not adding more, as that's for a separate report as an Issue 8 matter, but I feel it doesn't change the intent of that section for Issue 7.

rhansen

2014-10-09 15:53

manager   bugnote:0002415

On page 184 lines 5968-5970 (XBD 9.3.5 RE Bracket Expression), change:
A non-matching list expression begins with a <circumflex> ('^'), and specifies a list that shall match any single-character collating element except for the expressions represented in the list after the leading <circumflex>.

to:
A non-matching list expression begins with a <circumflex> ('^'), and the matching behavior shall be the logical inverse of the corresponding matching list expression (the same bracket expression but without the leading <circumflex>).

Issue History

Date Modified Username Field Change
2014-08-27 16:16 nsz New Issue
2014-08-27 16:16 nsz Name => Szabolcs Nagy
2014-08-27 16:16 nsz Organization => musl libc
2014-08-27 16:16 nsz Section => 9.3.5 RE Bracket Expression
2014-08-27 16:16 nsz Page Number => -
2014-08-27 16:16 nsz Line Number => -
2014-09-23 18:52 nsz Note Added: 0002396
2014-10-09 14:22 shware_systems Note Added: 0002413
2014-10-09 15:27 rhansen Page Number - => 184
2014-10-09 15:27 rhansen Line Number - => 5968-5970
2014-10-09 15:27 rhansen Interp Status => ---
2014-10-09 15:53 rhansen Note Added: 0002415
2014-10-09 15:56 rhansen Final Accepted Text => 0000872:0002415
2014-10-09 15:56 rhansen Status New => Resolved
2014-10-09 15:56 rhansen Resolution Open => Accepted As Marked
2014-10-09 15:57 rhansen Tag Attached: tc2-2008
2015-06-04 16:25 eblake Relationship added related to 0000938
2019-06-10 08:54 agadmin Status Resolved => Closed