Anonymous | Login | 2024-09-07 14:09 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000528 | [1003.1(2008)/Issue 7] Shell and Utilities | Objection | Enhancement Request | 2011-12-17 17:49 | 2024-06-11 08:53 | ||
Reporter | dwheeler | View Status | public | ||||
Assigned To | ajosey | ||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | David A. Wheeler | ||||||
Organization | |||||||
User Reference | |||||||
Section | sed | ||||||
Page Number | 3153-3161 | ||||||
Line Number | 104759-105140 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | See Note: 0001106 | ||||||
Summary | 0000528: Support extended regular expressions (EREs) in sed | ||||||
Description |
I propose standardizing a flag so sed users can use Extended Regular Expression (EREs) and not just basic regular expressions (BREs), making sed consistent with grep and awk. I recommend using “-E”, though “-r” is plausible (both are widely implemented). Here is my rationale: 1. This would make sed consistent with the related POSIX utilities grep (which supports EREs using -E) and awk (which *only* supports EREs). It is an odd inconsistency that POSIX sed cannot use EREs at all, when related utilities *allow* or *require* EREs. 2. Many matches and substitutions are simpler to express as an ERE than as a BRE, and these are typical uses of sed. BRE doesn’t include functionality like ERE’s special character | (“or”) at all. Common BRE expressions like \(, \), \{, and \} are, in ERE, the simpler (, ), {, and }. BRE doesn’t include the ERE special character + to mean “one or more”; you can simulate it with "\{1,\}”, and GNU sed includes the extension \+, but EREs include this as a special character because it is such a common need. This increased functionality and simplification matters. 3. Supporting EREs also makes sed more consistent with the many programming languages that support regular expressions, including Perl and Python. Most languages with built-in regular expression support either EREs or Perl-like REs (which are a superset of EREs), not BREs. As a result, far more people are familiar with the ERE syntax than the BRE syntax. 4. Many people consider BREs “obsolete” and that EREs are more “modern”. For example, the FreeBSD sed documentation says that its “-E” flag uses “extended (modern) regular expressions rather than basic regular expressions (BRE's)”, and its documentation for regex(7) describes the REG_EXTENDED flag as “Compile modern (‘extended’) REs, rather than the obsolete (‘basic’) REs”. The book “Mac OS X UNIX: 101 byte-sized projects” by Adrian Mayo, page 450, says: “Unix supports three types of regular expressions... modern (also termed extended), obsolete (also termed basic), and Perl regular expressions...”. You don’t need to agree that BREs are obsolete, but since many people do believe this, it would be valuable to have a standard way to use their preferred regex format. 5. This capability is widely implemented using the “-E” flag, the “-r” flag, or both: * GNU sed supports EREs using either “-r” or “-E”, though it only documents “-r”. To confirm that it supports “-E”, see its source code at http://git.savannah.gnu.org/cgit/sed.git/tree/sed/sed.c [^] and look for the line containing the comment “Undocumented, for compatibility with BSD sed... case ‘E’: ”). * OpenBSD sed supports EREs using “-E” (and recommends this flag) and “-r” (the “-r” flag is documented as being for portability with GNU sed). See: http://www.openbsd.org/cgi-bin/man.cgi?query=sed&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html. [^] * FreeBSD sed has the same situation as OpenBSD; it supports “-E” (and recommends this flag), and it supports “-r” for compatibility with GNU sed. http://www.freebsd.org/cgi/man.cgi?query=sed&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html [^] * NetBSD sed is the same as OpenBSD; it supports “-E” and “-r”. See: http://netbsd.gw.com/cgi-bin/man-cgi?sed++NetBSD-current [^] * MacOS sed supports only -E for this, and not “-r”: http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man1/sed.1.html [^] 6. ERE support is trivial to add to existing sed implementations if they don’t already have it. A sed implementation that doesn’t already implement EREs could simply recognize a flag, and later call regex() using REG_EXTENDED if that flag was present. Since regex() is already in the standard, this should be trivial to do for any sed implementation that cares about the POSIX standard at all. 7. People really do depend on sed supporting EREs. For example, in this email: http://ghostscript.com/pipermail/gs-bugs/2010-June/014175.html [^] the developers were using sed with EREs, and decided that they would use different flags for different systems instead of trying to use BREs. They noted that this their approach would fail if sed didn’t support EREs, but that seemed to be of no concern to them. Which suggests there’s a need to standardize using EREs in sed! The question is, what flag should be used? Both “-E” (BSD-style) and “-r” (GNU-style) are in common use. I recommend that the “-E” (BSD-style) flag be standardized at least, because: 1. It’s consistent with grep (which already uses the “-E” flag). 2. The flag letter is more obviously related to “Extended Regular Expressions (EREs)”. 3. It’s widely supported. In particular, GNU sed already supports -E, while MacOS (as far as I can tell) doesn’t support “-r”. A counter-argument is that some people might be confused by upper and lower case flags (‘E’ vs. ‘e’) with such different meanings. I disagree. POSIX flags are already case-sensitive; widely-used utilities like “ls” already have lots of flags that differ only by case. Indeed, the related POSIX utility awk has flags that differ by case (“-F” vs. “-f”). The symbols “e” and “E” have very different appearance in most fonts, so it’s unlikely that they would be visually confused. One minor problem is that many non-portable sed scripts already use “-r”, since that is what GNU sed actually documents and originally implemented. Alternatives include: 1. Supporting both “-r” and “-E” (as synonyms). This would have the advantage of making many existing scripts comply with the new specification, without any changes. It’s also very easy to implement; most implementations could just add a case statement if it’s not there already. 2. Supporting just “-r” instead of “-E”. I think “-E” is better, as described above, but this would be fine as well. 3. Recommending, but not requiring, that implementations support “-r” as a synonym. I don’t *really* care what the flag name(s) are, as long as there’s a standard way to access EREs. If people want those instead, the proposal could be easily tweaked to accommodate this. |
||||||
Desired Action |
Just before line 104774, add: -E Match using extended regular expressions. Treat each pattern specified as an ERE, as described in XBD Section 9.4 (on page 188). On line 104844, change “BRE” to “RE”. In lines 104857-104858, replace: The sed utility shall support the BREs described in XBD Section 9.3 (on page 183), with the following additions: with: The sed utility shall support the REs described in XBD Section 9 (on page 181); by default it shall use BREs as described in XBD Section 9.3 (on page 183), but if the “-E” option is used, it shall use EREs as described in XBD Section 9.4 (on page 188). Both BREs and EREs shall also support the following additions: In lines 104859-104866 and 104963-104990, change all instances of “BRE” to “RE”. |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
Relationships | |||||||
|
Notes | |
(0001106) Don Cragun (manager) 2012-01-26 16:51 |
Make the changes suggested in the Desired Action and also change:sed [−n] on all three lines of the SYNOPSIS section (P3153, L104762-104764) to: sed [-En] |
(0001112) dwheeler (reporter) 2012-01-27 05:00 |
Great news, thanks! A minor grammar tweak: When applying the "Desired Action" material, replace "is is" with just "is". |
(0001113) Don Cragun (manager) 2012-01-27 05:52 |
The Desired Action field has been tweaked to fix the issue noted in Note: 0001112 |
Issue History | |||
Date Modified | Username | Field | Change |
2011-12-17 17:49 | dwheeler | New Issue | |
2011-12-17 17:49 | dwheeler | Status | New => Under Review |
2011-12-17 17:49 | dwheeler | Assigned To | => ajosey |
2011-12-17 17:49 | dwheeler | Name | => David A. Wheeler |
2011-12-17 17:49 | dwheeler | Section | => sed |
2011-12-17 17:49 | dwheeler | Page Number | => 3153-3161 |
2011-12-17 17:49 | dwheeler | Line Number | => 104759-105140 |
2012-01-26 16:51 | Don Cragun | Interp Status | => --- |
2012-01-26 16:51 | Don Cragun | Note Added: 0001106 | |
2012-01-26 16:51 | Don Cragun | Status | Under Review => Resolved |
2012-01-26 16:51 | Don Cragun | Resolution | Open => Accepted As Marked |
2012-01-26 16:53 | Don Cragun | Final Accepted Text | => See Note: 0001106 |
2012-01-26 16:53 | Don Cragun | Desired Action Updated | |
2012-01-26 16:54 | Don Cragun | Tag Attached: issue8 | |
2012-01-27 05:00 | dwheeler | Note Added: 0001112 | |
2012-01-27 05:52 | Don Cragun | Note Added: 0001113 | |
2012-01-27 05:52 | Don Cragun | Desired Action Updated | |
2015-04-23 23:15 | emaste | Issue Monitored: emaste | |
2017-12-21 03:44 | eadler | Issue Monitored: eadler | |
2020-03-18 15:37 | geoffclare | Status | Resolved => Applied |
2022-01-10 09:19 | geoffclare | Relationship added | has duplicate 0001545 |
2024-06-11 08:53 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |