View Issue Details

IDProjectCategoryView StatusLast Update
00007731003.1(2008)/Issue 7Base Definitions and Headerspublic2022-03-24 15:34
Reporterdwheeler Assigned Toajosey  
PrioritynormalSeverityObjectionTypeEnhancement Request
Status ClosedResolutionDuplicate 
NameDavid A. Wheeler
Organization
User Reference
Section9 Regular Expressions
Page Number187-193
Line Number6068-6337
Interp Status---
Final Accepted Text
Summary0000773: Summary: Add \+, \?, and \| to Basic Regular Expressions (BREs)
DescriptionBREs are the default or only regular expression format supported by some tools. However, BREs as currently defined in POSIX don’t support \+, \?, or \| as BRE equivalents of the ERE +, ?, or |. These capabilities are built into EREs because they are convenient and useful; BREs should be updated to provide these capabilities in a backwards-compatible way.

These are already available in multiple implementations. GNU’s BRE implementation already supports \+, \?, and \|. MacOS also supports these when the REG_ENHANCED flag is used: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man7/re_format.7.html

The proposed "desired action" was created by copying and modifying some of the ERE text into the rules for BREs.
Desired ActionInsert before line 6068 the following text as a new numbered item (this text is based on lines 6170-6174):
“When a BRE matching a single character or a BRE enclosed in parentheses is followed
by <backslash> <plus-sign> (’\+’), that sequence shall match what one or more consecutive occurrences of the BRE would match. For example, the BRE "b\+(bc)" matches the fourth to seventh characters in the string "acabbbcde". And, "[ab]\+" and "[ab][ab]*" are equivalent.”

Insert before line 6072 the following text as a new numbered item (this text is based on lines 6181-6184):
When a BRE matching a single character or a BRE enclosed in parentheses is followed
by <backslash> <question-mark> (’\?’), that entire sequence shall match what zero or one consecutive occurrences of the BRE would match. For example, the BRE "b\?c" matches the second character in the string "acabbbcde".

Insert before line 6089 a new subsection “BRE Alternation” with the following text (this text is based on lines 6200-6205):
Two BREs separated by <backslash> <vertical-line> (’\|’) shall match a string that is matched by either. For example, the BRE "a((bc)\|d)" matches the string "abc" and the string "ad". Single characters, or expressions matching single characters, separated by the <backslash> <vertical-line> and enclosed in parentheses, shall be treated as a BRE matching a single
character.

In section 9.3.7’s table, modify it as follows (this is based on the table in section 9.4.8): for “Single-character-BRE duplication” add \+ and \?. Also add a new row, Alternation, with value \|.

After line 6313, add:
%token Back_plus Back_star Back_bar
/* \+ \* \| */

On line 6314 and later, rename basic_reg_exp to BRE_expression, and insert above it the following text based on the equivalent ERE grammar:
basic_reg_exp : BRE_branch
| basic_reg_exp ’\|’ BRE_branch
;
BRE_branch : BRE_expression
| BRE_branch BRE_expression

After line 6337, add to RE_dupl_symbol :
| Back_plus
| Back_star
TagsNo tags attached.

Relationships

duplicate of 0001546 Closed 1003.1(2016/18)/Issue7+TC2 BREs: reserve \? \+ and \| 

Activities

geoffclare

2013-10-16 09:05

manager   bugnote:0001914

For this to stand any chance of being accepted, some major omissions
in the desired action need to be addressed:

9.3.8 needs updating (this may also affect the grammar).

Need to say something about \+ or \? at the beginning of a BRE or
following \|, ^ (when special), or \(, and about * or \{ following \|.

Some changes are needed on the regcomp() page (e.g. item 2 in the
numbered list).

Changes might be needed to the REG_* error macros on the regcomp()
and <regex.h> pages - looks like they need an overhaul anyway to
distinguish properly between BREs and EREs.

There may be more omissions - these are just the ones that have
occurred to me so far.

Also the phrase "enclosed in parentheses" is incorrect for BREs; it
should be: enclosed between "\(" and "\)".

dwheeler

2013-10-28 01:30

reporter   bugnote:0001946

Thanks for the comments!

I'm confused about "9.3.8 needs updating". That section is "BRE Expression Anchoring", which is about "^" and "$"... which is not related at all. I'm guessing that you meant another section, can you tell me which one?

I'd be okay with leaving "weird" situations unspecified. E.G., at line 6084-6085, change "The behavior of multiple adjacent duplication symbols (’*’ and intervals) produces undefined results." into the following:
The behavior of multiple adjacent duplication symbols (’*’, ’\+’, ’\?’, and intervals) produces undefined results. The behavior a \? or \+ that is initial (begins a BRE, follows \|, follows ^ when special, or follows "\(") produces undefined results. The behavior or "*" or "\{" following "\|" produces undefined results.

There are, of course, good arguments for producing an error instead, so if anyone wants to require that, that's great.

To fix regcomp(), starting line 57468, Change:
 ’*’ or "\{\}" appears immediately after the subexpression in a basic regular expression..."
Into:
 ’*’ or "\{\}" or "\+" or "\?" appears immediately after the subexpression in a basic regular expression..."

Before line 57472 add the following text (which is intentionally similar to what is there):
’\|’ is used in a basic regular expression to select this subexpression or
another, and the other subexpression matched.

Note: If people DO want to cause bad \? to produce errors, then we need to modify line 57505: Change "REG_BADRPT ’?’, ’*’, or ’+’ not preceded by valid regular expression." to add to the end of the list '\?', '\+'.

geoffclare

2022-03-24 15:33

manager   bugnote:0005758

There is a resolution for effectively the same request in bug 0001546, so this is being closed as a duplicate of that bug.

Issue History

Date Modified Username Field Change
2013-10-16 02:51 dwheeler New Issue
2013-10-16 02:51 dwheeler Status New => Under Review
2013-10-16 02:51 dwheeler Assigned To => ajosey
2013-10-16 02:51 dwheeler Name => David A. Wheeler
2013-10-16 02:51 dwheeler Section => 9 Regular Expressions
2013-10-16 02:51 dwheeler Page Number => 187-193
2013-10-16 02:51 dwheeler Line Number => 6068-6337
2013-10-16 09:05 geoffclare Note Added: 0001914
2013-10-28 01:30 dwheeler Note Added: 0001946
2022-03-12 21:01 Don Cragun Relationship added related to 0001546
2022-03-24 15:33 geoffclare Interp Status => ---
2022-03-24 15:33 geoffclare Note Added: 0005758
2022-03-24 15:33 geoffclare Status Under Review => Closed
2022-03-24 15:33 geoffclare Resolution Open => Duplicate
2022-03-24 15:34 geoffclare Relationship replaced duplicate of 0001546