View Issue Details

IDProjectCategoryView StatusLast Update
00019111003.1(2024)/Issue8Base Definitions and Headerspublic2025-07-03 15:58
Reporterdannyniu Assigned Togeoffclare  
PrioritynormalSeverityEditorialTypeOmission
Status AppliedResolutionAccepted As Marked 
NameDannyNiu/NJF
OrganizationIndividual
User Reference
SectionXBD Section 9, Regular Expressions
Page NumberPage 187
Line NumberLine 6675 Onwards
Interp Status---
Final Accepted TextSee 0001911:0007096.
Summary0001911: Clearify & consistently define "subexpression" for ERE.
DescriptionIn https://www.austingroupbugs.net/view.php?id=1857 wording was revised to clarify the behavior of lazy matches. Subsequently, further issues were noted. In particular, the use of ''subpatterns'' and ''subexpressions'' were found not used consistently.

As mentioned on the mailing list, the word ''subpattern'' is eliminated by Bug-1857; ''subexpression'' were defined for BRE to mean the content of a (escaped) parenthesized substring of the spelling of an expression, however this was not done for ERE.
Desired ActionOn the mailing list, it was mentioned this consistency be best addressed in a separate bug. The desired action for this bug is to consistently define the term ''subexpression'' for ERE (as had been done for BRE).
Tagstc1-2024

Activities

geoffclare

2025-03-06 17:49

manager   bugnote:0007096

Last edited: 2025-03-06 17:50

Suggested changes ...

On page 187 line 6684-6686 section 9.4.1, delete:
An ERE matching a single character enclosed in parentheses shall match the same as the ERE without parentheses would have matched.

On page 189 line 6729 section 9.4.6, change:
1. A concatenation of EREs shall match the concatenation of the character sequences matched by each component of the ERE. A concatenation of EREs enclosed in parentheses shall match whatever the concatenation without the parentheses matches. For example, both the ERE "cd" and the ERE "(cd)" are matched by the third and fourth character of the string "abcdefabcdef".
to:
1. A concatenation of EREs shall match the concatenation of the strings matched by each component of the ERE.

2. A subexpression can be defined within an ERE by enclosing it in parentheses. Such a subexpression shall match whatever it would have matched without the parentheses. For example, both the ERE "cd" and the ERE "(cd)" are matched by the third and fourth character of the string "abcdefabcdef". Subexpressions can be arbitrarily nested.
and renumber the remaining items.

On page 189 line 6734 section 9.4.6, and
page 189 line 6739 section 9.4.6, and
page 189 line 6745 section 9.4.6, and
page 189 line 6749 section 9.4.6, change:
When an ERE matching a single character or an ERE enclosed in parentheses is followed by ...
to:
When an ERE matching a single character or an ERE subexpression is followed by ...

On page 189 line 6772 section 9.4.6, change:
An ERE matching a single character repeated by an '*', '?', or an interval expression shall not match a null expression unless ...
to:
An ERE matching a single character, or an ERE subexpression, repeated by an '*', '?', or an interval expression shall not match a null expression unless ...

On page 190 line 6787 section 9.4.8, change:
Grouping
to:
Subexpressions

dannyniu

2025-06-30 11:21

reporter   bugnote:0007215

There's an additional issue. In 9.4.9 ERE Expression Anchoring, it's said:

> shall anchor the expression or subexpression

However, the anchors also apply to the beginning and end of alternatives, i.e. next to "|". As such I suggest changing it to:

> shall anchor the expression, subexpression, and beginning or end of alternatives.

geoffclare

2025-06-30 13:33

manager   bugnote:0007216

Last edited: 2025-06-30 13:34

Re 0001911:0007215
I don't see any problem there. I think you are misinterpreting "expression" as meaning "entire regular expression" (which is a defined term in 9.1). Look in particular at the example at the end of item 2:
the ERE "e$f" is valid, but can never match because the 'f' prevents the expression "e$" from matching ending at the last character

The "$" in "e$f" anchors the expression "e$", even though the "$" is followed by "f". There is nothing special about the beginning and end of alternatives as regards where anchors can be placed (and be treated as anchors).

Issue History

Date Modified Username Field Change
2025-02-28 02:37 dannyniu New Issue
2025-03-06 17:49 geoffclare Note Added: 0007096
2025-03-06 17:50 geoffclare Note Edited: 0007096
2025-03-13 15:21 Don Cragun Status New => Resolved
2025-03-13 15:21 Don Cragun Resolution Open => Accepted As Marked
2025-03-13 15:21 Don Cragun Interp Status => ---
2025-03-13 15:21 Don Cragun Final Accepted Text => See 0001911:0007096.
2025-03-13 15:23 Don Cragun Tag Attached: tc1-2024
2025-06-30 11:21 dannyniu Note Added: 0007215
2025-06-30 13:33 geoffclare Note Added: 0007216
2025-06-30 13:33 geoffclare Note Edited: 0007216
2025-06-30 13:34 geoffclare Note Edited: 0007216
2025-07-03 13:54 geoffclare Assigned To => geoffclare
2025-07-03 15:58 geoffclare Status Resolved => Applied