Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000305 [1003.1(2008)/Issue 7] Base Definitions and Headers Objection Clarification Requested 2010-09-01 15:37 2013-04-16 13:06
Reporter eblake View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Eric Blake
Organization Red Hat
User Reference ebb.bre
Section RE Bracket Expression
Page Number 184
Line Number 5905
Interp Status Approved
Final Accepted Text See Note: 0000545
Summary 0000305: Allow RE handling to reject suspicious uses
Description The standard currently appears to be silent on the required behavior
when a collating element is specified more than once within a bracket
expression. Current implementation practice appears to be that
duplicates are silently ignored; that is, in the following list, each
pair of BRE would match the same expressions:
[a] and [aa]
[a-c] and [a-cb]
[[:alpha:]] and [a[:alpha:]]
But without an explicit statement to this effect, it is not clear that
compliant applications can rely on this.

Meanwhile, GNU grep would like to reject suspicious uses, on the grounds
that when a user does:

grep '[:upper:]' file

they probably intended to do:

grep '[[:upper:]]' file

rather than the behavior they get from most implementations, of:

grep '[:epru]' file

The problem is less common with collating symbols and equivalence class
expressions, but it doesn't hurt to cover mistakes on these two uses as
well. And since EREs defer to BREs for the definition of bracket
expressions, fixing the standard to leave behavior unspecified will
allow more than just grep to use that as an opportunity to issue
warnings for suspicious uses.

This proposal therefore does two things - explicitly tightens behavior
of most duplicates to match common practice (with the slight potential
to invalidate a Weirdnix implementation that, under the current silence
of the standard, is within its rights to reject duplicates as an
invalid RE), while explicitly relaxing behavior for the three
characters that can combine with '[' to form different collating
expressions, so that GNU grep's warning on suspicious uses is
explicitly permitted by the standard.

Leaving the behavior unspecified allows implementations the choice of
whether to print a warning but proceed anyways, fail with a loud error,
silently convert to the most-likely-intended character class, or any
other action; in particular, it will not render any existing
implementations non-compliant. And since any use of two colons without
a character class is probably already a bug in the user's script, it is
highly unlikely that relaxing the standard in this regard will impact
any well-written existing scripts.
Desired Action At line 5995 (XBD 9.3.5), add a new paragraph:

8. In general, an implementation shall handle duplicate listings of the same collating element within a bracket expression as though the collating element had only been listed once; regardless of whether the duplication occurs by any combination single-character collating elements, collating symbols, equivalence classes, character classes, or range expressions. However, behavior is unspecified if <period>, <equals-sign>, or <colon> is included more than once as a single-character collating element.
Tags tc1-2008
Attached Files

- Relationships

-  Notes
(0000536)
eblake (manager)
2010-09-02 15:29
edited on: 2010-09-09 16:08

The original Desired Action is incorrect, because the existing wording is
already sufficient to cover the case of duplicate collating elements:

"A matching list expression specifies a list that shall match any
single-character collating element in any of the expressions represented
in the list." (XBD line 5921)

Meanwhile, it loosens the standard too far. The standard already allows
regcomp() to fail with additional REG_* failures (XBD line 10727), but
XSH does not mention this fact. A better proposed wording, which bounds
the possible behaviors, follows:


At line 5995 (XBD 9.3.5), add a new paragraph:

8. If a bracket expression contains at least three list elements, where
the first and last list element are the same single-character element
of <period>, <equals-sign>, or <colon>, then it is unspecified whether
the bracket expression will be treated as a collating symbol, equivalence
class, or character class, respectively; treated as a matching list
expression; or rejected as an error.

At line 56541 (XSH regcomp), change:

The following constants are defined as error return values:

to:

The following constants are defined as the minimum set of error return
values, although other errors listed as implementation extensions in
<regex.h> are possible:

At line 116821 (XRAT A.9.3.5), add a new paragraph:

The standard specifies three possible behaviors for regular expressions
such as "[:alpha:]". One behavior is the traditional implementation,
which behaves like "[:ahlp]". Another, for alignment with the tr utility,
is to treat it like "[[:alpha:]]". And finally, the standard allows
rejecting the regular expression as invalid, as a means of alerting a
user to the non-portable aspect of that regular expression. The set
of regular expressions with this undefined behavior is limited solely
to the expressions where the outer '[' and ']' of the bracket expression
can be confused with the missing bracket pair '[' and ']' necessary to
form a collating symbol, equivalence class, or character class; thus
"[_:alpha:]" or "[::]" do not trigger the unspecified behavior.

(0000545)
nick (manager)
2010-09-09 16:20
edited on: 2010-09-09 16:21

Interpretation response
------------------------
The standard states the required behaviour for regular expressions, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
See Mantis Issue 305.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
See Note: 0000536

(0000574)
ajosey (manager)
2010-10-14 11:29

Interpretation approved 14 October 2010

- Issue History
Date Modified Username Field Change
2010-09-01 15:37 eblake New Issue
2010-09-01 15:37 eblake Status New => Under Review
2010-09-01 15:37 eblake Assigned To => ajosey
2010-09-01 15:37 eblake Name => Eric Blake
2010-09-01 15:37 eblake Organization => Red Hat
2010-09-01 15:37 eblake User Reference => ebb.bre
2010-09-01 15:37 eblake Section => RE Bracket Expression
2010-09-01 15:37 eblake Page Number => 184
2010-09-01 15:37 eblake Line Number => 5905
2010-09-02 15:29 eblake Note Added: 0000536
2010-09-09 16:08 eblake Note Edited: 0000536
2010-09-09 16:20 nick Interp Status => ---
2010-09-09 16:20 nick Note Added: 0000545
2010-09-09 16:20 nick Status Under Review => Interpretation Required
2010-09-09 16:20 nick Resolution Open => Accepted As Marked
2010-09-09 16:21 nick Note Edited: 0000545
2010-09-09 16:22 nick Final Accepted Text => See Note: 0000545
2010-09-09 16:26 Don Cragun Tag Attached: tc1-2008
2010-09-13 05:04 ajosey Interp Status --- => Pending
2010-09-13 05:49 ajosey Interp Status Pending => Proposed
2010-10-14 11:29 ajosey Interp Status Proposed => Approved
2010-10-14 11:29 ajosey Note Added: 0000574
2013-04-16 13:06 ajosey Status Interpretation Required => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker