Anonymous | Login | 2024-04-26 19:07 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000305 | [1003.1(2008)/Issue 7] Base Definitions and Headers | Objection | Clarification Requested | 2010-09-01 15:37 | 2013-04-16 13:06 | ||
Reporter | eblake | View Status | public | ||||
Assigned To | ajosey | ||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Eric Blake | ||||||
Organization | Red Hat | ||||||
User Reference | ebb.bre | ||||||
Section | RE Bracket Expression | ||||||
Page Number | 184 | ||||||
Line Number | 5905 | ||||||
Interp Status | Approved | ||||||
Final Accepted Text | See Note: 0000545 | ||||||
Summary | 0000305: Allow RE handling to reject suspicious uses | ||||||
Description |
The standard currently appears to be silent on the required behavior when a collating element is specified more than once within a bracket expression. Current implementation practice appears to be that duplicates are silently ignored; that is, in the following list, each pair of BRE would match the same expressions: [a] and [aa] [a-c] and [a-cb] [[:alpha:]] and [a[:alpha:]] But without an explicit statement to this effect, it is not clear that compliant applications can rely on this. Meanwhile, GNU grep would like to reject suspicious uses, on the grounds that when a user does: grep '[:upper:]' file they probably intended to do: grep '[[:upper:]]' file rather than the behavior they get from most implementations, of: grep '[:epru]' file The problem is less common with collating symbols and equivalence class expressions, but it doesn't hurt to cover mistakes on these two uses as well. And since EREs defer to BREs for the definition of bracket expressions, fixing the standard to leave behavior unspecified will allow more than just grep to use that as an opportunity to issue warnings for suspicious uses. This proposal therefore does two things - explicitly tightens behavior of most duplicates to match common practice (with the slight potential to invalidate a Weirdnix implementation that, under the current silence of the standard, is within its rights to reject duplicates as an invalid RE), while explicitly relaxing behavior for the three characters that can combine with '[' to form different collating expressions, so that GNU grep's warning on suspicious uses is explicitly permitted by the standard. Leaving the behavior unspecified allows implementations the choice of whether to print a warning but proceed anyways, fail with a loud error, silently convert to the most-likely-intended character class, or any other action; in particular, it will not render any existing implementations non-compliant. And since any use of two colons without a character class is probably already a bug in the user's script, it is highly unlikely that relaxing the standard in this regard will impact any well-written existing scripts. |
||||||
Desired Action |
At line 5995 (XBD 9.3.5), add a new paragraph: 8. In general, an implementation shall handle duplicate listings of the same collating element within a bracket expression as though the collating element had only been listed once; regardless of whether the duplication occurs by any combination single-character collating elements, collating symbols, equivalence classes, character classes, or range expressions. However, behavior is unspecified if <period>, <equals-sign>, or <colon> is included more than once as a single-character collating element. |
||||||
Tags | tc1-2008 | ||||||
Attached Files | |||||||
|
Notes | |
(0000536) eblake (manager) 2010-09-02 15:29 edited on: 2010-09-09 16:08 |
The original Desired Action is incorrect, because the existing wording is already sufficient to cover the case of duplicate collating elements: "A matching list expression specifies a list that shall match any single-character collating element in any of the expressions represented in the list." (XBD line 5921) Meanwhile, it loosens the standard too far. The standard already allows regcomp() to fail with additional REG_* failures (XBD line 10727), but XSH does not mention this fact. A better proposed wording, which bounds the possible behaviors, follows: At line 5995 (XBD 9.3.5), add a new paragraph: 8. If a bracket expression contains at least three list elements, where the first and last list element are the same single-character element of <period>, <equals-sign>, or <colon>, then it is unspecified whether the bracket expression will be treated as a collating symbol, equivalence class, or character class, respectively; treated as a matching list expression; or rejected as an error. At line 56541 (XSH regcomp), change: The following constants are defined as error return values: to: The following constants are defined as the minimum set of error return values, although other errors listed as implementation extensions in <regex.h> are possible: At line 116821 (XRAT A.9.3.5), add a new paragraph: The standard specifies three possible behaviors for regular expressions such as "[:alpha:]". One behavior is the traditional implementation, which behaves like "[:ahlp]". Another, for alignment with the tr utility, is to treat it like "[[:alpha:]]". And finally, the standard allows rejecting the regular expression as invalid, as a means of alerting a user to the non-portable aspect of that regular expression. The set of regular expressions with this undefined behavior is limited solely to the expressions where the outer '[' and ']' of the bracket expression can be confused with the missing bracket pair '[' and ']' necessary to form a collating symbol, equivalence class, or character class; thus "[_:alpha:]" or "[::]" do not trigger the unspecified behavior. |
(0000545) nick (manager) 2010-09-09 16:20 edited on: 2010-09-09 16:21 |
Interpretation response ------------------------ The standard states the required behaviour for regular expressions, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- See Mantis Issue 305. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- See Note: 0000536 |
(0000574) ajosey (manager) 2010-10-14 11:29 |
Interpretation approved 14 October 2010 |
Issue History | |||
Date Modified | Username | Field | Change |
2010-09-01 15:37 | eblake | New Issue | |
2010-09-01 15:37 | eblake | Status | New => Under Review |
2010-09-01 15:37 | eblake | Assigned To | => ajosey |
2010-09-01 15:37 | eblake | Name | => Eric Blake |
2010-09-01 15:37 | eblake | Organization | => Red Hat |
2010-09-01 15:37 | eblake | User Reference | => ebb.bre |
2010-09-01 15:37 | eblake | Section | => RE Bracket Expression |
2010-09-01 15:37 | eblake | Page Number | => 184 |
2010-09-01 15:37 | eblake | Line Number | => 5905 |
2010-09-02 15:29 | eblake | Note Added: 0000536 | |
2010-09-09 16:08 | eblake | Note Edited: 0000536 | |
2010-09-09 16:20 | nick | Interp Status | => --- |
2010-09-09 16:20 | nick | Note Added: 0000545 | |
2010-09-09 16:20 | nick | Status | Under Review => Interpretation Required |
2010-09-09 16:20 | nick | Resolution | Open => Accepted As Marked |
2010-09-09 16:21 | nick | Note Edited: 0000545 | |
2010-09-09 16:22 | nick | Final Accepted Text | => See Note: 0000545 |
2010-09-09 16:26 | Don Cragun | Tag Attached: tc1-2008 | |
2010-09-13 05:04 | ajosey | Interp Status | --- => Pending |
2010-09-13 05:49 | ajosey | Interp Status | Pending => Proposed |
2010-10-14 11:29 | ajosey | Interp Status | Proposed => Approved |
2010-10-14 11:29 | ajosey | Note Added: 0000574 | |
2013-04-16 13:06 | ajosey | Status | Interpretation Required => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |