Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000528 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2011-12-17 17:49 2020-03-18 15:37
Reporter dwheeler View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Applied  
Name David A. Wheeler
Organization
User Reference
Section sed
Page Number 3153-3161
Line Number 104759-105140
Interp Status ---
Final Accepted Text See Note: 0001106
Summary 0000528: Support extended regular expressions (EREs) in sed
Description I propose standardizing a flag so sed users can use Extended Regular Expression (EREs) and not just basic regular expressions (BREs), making sed consistent with grep and awk. I recommend using “-E”, though “-r” is plausible (both are widely implemented).

Here is my rationale:

1. This would make sed consistent with the related POSIX utilities grep (which supports EREs using -E) and awk (which *only* supports EREs). It is an odd inconsistency that POSIX sed cannot use EREs at all, when related utilities *allow* or *require* EREs.

2. Many matches and substitutions are simpler to express as an ERE than as a BRE, and these are typical uses of sed. BRE doesn’t include functionality like ERE’s special character | (“or”) at all. Common BRE expressions like \(, \), \{, and \} are, in ERE, the simpler (, ), {, and }. BRE doesn’t include the ERE special character + to mean “one or more”; you can simulate it with "\{1,\}”, and GNU sed includes the extension \+, but EREs include this as a special character because it is such a common need. This increased functionality and simplification matters.

3. Supporting EREs also makes sed more consistent with the many programming languages that support regular expressions, including Perl and Python. Most languages with built-in regular expression support either EREs or Perl-like REs (which are a superset of EREs), not BREs. As a result, far more people are familiar with the ERE syntax than the BRE syntax.

4. Many people consider BREs “obsolete” and that EREs are more “modern”. For example, the FreeBSD sed documentation says that its “-E” flag uses “extended (modern) regular expressions rather than basic regular expressions (BRE's)”, and its documentation for regex(7) describes the REG_EXTENDED flag as “Compile modern (‘extended’) REs, rather than the obsolete (‘basic’) REs”. The book “Mac OS X UNIX: 101 byte-sized projects” by Adrian Mayo, page 450, says: “Unix supports three types of regular expressions... modern (also termed extended), obsolete (also termed basic), and Perl regular expressions...”. You don’t need to agree that BREs are obsolete, but since many people do believe this, it would be valuable to have a standard way to use their preferred regex format.

5. This capability is widely implemented using the “-E” flag, the “-r” flag, or both:
* GNU sed supports EREs using either “-r” or “-E”, though it only documents “-r”. To confirm that it supports “-E”, see its source code at http://git.savannah.gnu.org/cgit/sed.git/tree/sed/sed.c [^] and look for the line containing the comment “Undocumented, for compatibility with BSD sed... case ‘E’: ”).
* OpenBSD sed supports EREs using “-E” (and recommends this flag) and “-r” (the “-r” flag is documented as being for portability with GNU sed). See: http://www.openbsd.org/cgi-bin/man.cgi?query=sed&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html. [^]
* FreeBSD sed has the same situation as OpenBSD; it supports “-E” (and recommends this flag), and it supports “-r” for compatibility with GNU sed. http://www.freebsd.org/cgi/man.cgi?query=sed&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html [^]
* NetBSD sed is the same as OpenBSD; it supports “-E” and “-r”. See: http://netbsd.gw.com/cgi-bin/man-cgi?sed++NetBSD-current [^]
* MacOS sed supports only -E for this, and not “-r”: http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/sed.1.html [^]

6. ERE support is trivial to add to existing sed implementations if they don’t already have it. A sed implementation that doesn’t already implement EREs could simply recognize a flag, and later call regex() using REG_EXTENDED if that flag was present. Since regex() is already in the standard, this should be trivial to do for any sed implementation that cares about the POSIX standard at all.

7. People really do depend on sed supporting EREs. For example, in this email: http://ghostscript.com/pipermail/gs-bugs/2010-June/014175.html [^] the developers were using sed with EREs, and decided that they would use different flags for different systems instead of trying to use BREs. They noted that this their approach would fail if sed didn’t support EREs, but that seemed to be of no concern to them. Which suggests there’s a need to standardize using EREs in sed!

The question is, what flag should be used? Both “-E” (BSD-style) and “-r” (GNU-style) are in common use. I recommend that the “-E” (BSD-style) flag be standardized at least, because:
1. It’s consistent with grep (which already uses the “-E” flag).
2. The flag letter is more obviously related to “Extended Regular Expressions (EREs)”.
3. It’s widely supported. In particular, GNU sed already supports -E, while MacOS (as far as I can tell) doesn’t support “-r”.

A counter-argument is that some people might be confused by upper and lower case flags (‘E’ vs. ‘e’) with such different meanings. I disagree. POSIX flags are already case-sensitive; widely-used utilities like “ls” already have lots of flags that differ only by case. Indeed, the related POSIX utility awk has flags that differ by case (“-F” vs. “-f”). The symbols “e” and “E” have very different appearance in most fonts, so it’s unlikely that they would be visually confused.

One minor problem is that many non-portable sed scripts already use “-r”, since that is what GNU sed actually documents and originally implemented. Alternatives include:
1. Supporting both “-r” and “-E” (as synonyms). This would have the advantage of making many existing scripts comply with the new specification, without any changes. It’s also very easy to implement; most implementations could just add a case statement if it’s not there already.
2. Supporting just “-r” instead of “-E”. I think “-E” is better, as described above, but this would be fine as well.
3. Recommending, but not requiring, that implementations support “-r” as a synonym.
I don’t *really* care what the flag name(s) are, as long as there’s a standard way to access EREs.
If people want those instead, the proposal could be easily tweaked to accommodate this.

Desired Action Just before line 104774, add:
-E Match using extended regular expressions. Treat each pattern specified as an ERE,
as described in XBD Section 9.4 (on page 188).

On line 104844, change “BRE” to “RE”.

In lines 104857-104858, replace:

The sed utility shall support the BREs described in XBD Section 9.3 (on page 183), with the following additions:
with:

The sed utility shall support the REs described in XBD Section 9 (on page 181); by default it shall use BREs as described in XBD Section 9.3 (on page 183), but if the “-E” option is used, it shall use EREs as described in XBD Section 9.4 (on page 188). Both BREs and EREs shall also support the following additions:

In lines 104859-104866 and 104963-104990, change all instances of “BRE” to “RE”.
Tags issue8
Attached Files

- Relationships
has duplicate 0001545Closed 1003.1(2016/18)/Issue7+TC2 sed: standardise or at least reserve -E with sed for use of EREs 

-  Notes
(0001106)
Don Cragun (manager)
2012-01-26 16:51

Make the changes suggested in the Desired Action and also change:
    sed [−n]

on all three lines of the SYNOPSIS section (P3153, L104762-104764) to:
    sed [-En]
(0001112)
dwheeler (reporter)
2012-01-27 05:00

Great news, thanks! A minor grammar tweak: When applying the "Desired Action" material, replace "is is" with just "is".
(0001113)
Don Cragun (manager)
2012-01-27 05:52

The Desired Action field has been tweaked to fix the issue noted in Note: 0001112

- Issue History
Date Modified Username Field Change
2011-12-17 17:49 dwheeler New Issue
2011-12-17 17:49 dwheeler Status New => Under Review
2011-12-17 17:49 dwheeler Assigned To => ajosey
2011-12-17 17:49 dwheeler Name => David A. Wheeler
2011-12-17 17:49 dwheeler Section => sed
2011-12-17 17:49 dwheeler Page Number => 3153-3161
2011-12-17 17:49 dwheeler Line Number => 104759-105140
2012-01-26 16:51 Don Cragun Interp Status => ---
2012-01-26 16:51 Don Cragun Note Added: 0001106
2012-01-26 16:51 Don Cragun Status Under Review => Resolved
2012-01-26 16:51 Don Cragun Resolution Open => Accepted As Marked
2012-01-26 16:53 Don Cragun Final Accepted Text => See Note: 0001106
2012-01-26 16:53 Don Cragun Desired Action Updated
2012-01-26 16:54 Don Cragun Tag Attached: issue8
2012-01-27 05:00 dwheeler Note Added: 0001112
2012-01-27 05:52 Don Cragun Note Added: 0001113
2012-01-27 05:52 Don Cragun Desired Action Updated
2015-04-23 23:15 emaste Issue Monitored: emaste
2017-12-21 03:44 eadler Issue Monitored: eadler
2020-03-18 15:37 geoffclare Status Resolved => Applied
2022-01-10 09:19 geoffclare Relationship added has duplicate 0001545


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker