Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001545 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Enhancement Request 2022-01-08 03:22 2022-01-11 05:48
Reporter calestyo View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Christoph Anton Mitterer
Organization
User Reference
Section sed utility
Page Number N/A
Line Number NA/
Interp Status ---
Final Accepted Text
Summary 0001545: sed: standardise or at least reserve -E with sed for use of EREs
Description As of now:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html [^]
doesn’t specify a mode where EREs are used instead of BREs.

For some major implementation this is however supported, a quick check revealed at lest GNU, FreeBSD, NetBSD, OpenBSD and busybox.
Desired Action I think POSIX should at least reserve sed's "-E" option (allowing implementations to support ERE's with sed im some way)... or standardise the use of EREs with sed.
Tags No tags attached.
Attached Files

- Relationships
duplicate of 0000528Appliedajosey 1003.1(2008)/Issue 7 Support extended regular expressions (EREs) in sed 

-  Notes
(0005583)
geoffclare (manager)
2022-01-10 09:19

The -E option has already been added in the Issue 8 drafts.

This bug should be closed as a duplicate of bug 0000528.
(0005587)
calestyo (reporter)
2022-01-10 15:19

Oh thanks for the information and sorry for not knowing it. Are these somewhere publicly available?

Other than that:

I'd either simply close this bug now, or "re-dedicate" it to "reservere" the option "-P", in the same sense, i.e. *if* it is used, than it should switch to PCRE.

Whether or not this makes sense, is another question ;-)
(0005588)
geoffclare (manager)
2022-01-10 15:56

> Are these somewhere publicly available?

Not publicly, but the barrier is quite low. You just need to have an Open Group account and subscribe to the austin-group-l mailing list. Then the drafts should be available from:

https://www.opengroup.org/austin/login.html [^]
(0005589)
stephane (reporter)
2022-01-10 21:34

Re: Note: 0005587

In ast-open's sed (which is also the sed builtin of ksh93 if enabled at compile time), sed -P enables perl-like regexps, but not via PCRE (only a subset is implemented and there are some incompatibilities). In any case, PCRE like perl's regexps is still a moving target (even though probably a lot less so these days).

In ssed (which could be considered as the devel version of GNU sed when it was still actively maintained), sed -R enabled PCRE matching (this time via PCRE).

I would welcome sed -P to use perl-like regexp, considering that perl-like regexps have become the new de-facto standard for regexps, but maybe POSIX is not the place where those regexps should be specified.
(0005590)
calestyo (reporter)
2022-01-10 22:22

I thought PCRE in GNU sed was rather kinda rejected... (see [0] and [1])

But if at all, I would rather just "reserve" and option for Perl/PCRE like regular expressions, so that vendors don't use -P for something completely else.
And not define exactly how these work or which implementations is should be.

Maybe one could name it "Perl-like" or similar, so that one neither clearly mandates them to be 100% Perl compatible, nor 100% PCRE compatible.



[0] https://lists.gnu.org/archive/html/sed-devel/2017-11/msg00001.html [^]
[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22801 [^]
(0005591)
philip-guenther (reporter)
2022-01-10 23:47

I'm not sure I understand how "reserved for use to provide a behavior that is itself not completely specified" logically makes sense. The term 'PCRE' would have to be defined by some stable reference, or else anyone can just say "oh, I comply with _a_ version of PCRE" and there's nothing *in posix* that someone could say they are non-compliant with.

If the concern is just that some future version of POSIX might specific a -P option that has some totally unrelated effect, well, I think that's solved/prevented by the on-going participation of community members in the POSIX revision/review process. "Don't do something stupid in the future" isn't actually implementable: we are not always wiser than our successors and they have the power to make their own decisions about what is wise, rescinding our choices now.
(0005593)
calestyo (reporter)
2022-01-11 00:24

Well isn't that common practice in standards to not fully describe everything and refer to some other place?
E.g. anyone using things like \uxxxx would refer to Unicode, which by itself is a evolving standard.

The only difference here would be that there is no single standard. As for Perl RE's the standard is likely "whatever the current version of Perl does" (and perlre(1) rather just describing that).
And PCRE is very but not absolutely similar to that. Again not so much different to what we have e.g. with BREs, which e.g GNU extends.


The concern is rather that *implementations* might assign -P to something completely different (or use something else for the same effect, i.e. using PCRE), thus causing unnecessary portability issues.
See e.g. GNU's grep, which assigned several one-letter options for some use beyond what POSIX specifies.
(0005595)
philip-guenther (reporter)
2022-01-11 04:19

The standard can and does refer to other standards, using their *stable* identifiers. For example, Geoff's been spending a lot of time working out the details of updating the reference to the C standard to by C17. If an application compiles in the environment of the current POSIX standard and following its rules they get the version of C associated with that version.

So, if there's a stable standard for the behavior of PCRE, then sure, POSIX could cite that and specify that grep/sed -P follow the behavior specified there. That would let applications authors request a precise behavior and if they didn't get it it would be a defect in the implementation, not their application.

But you're explicitly denying the existence of a stable standard, "PCRE-2021", or maybe denying an interest in having applications adhere to such a thing if it does exist. How is an application author supposed to use or rely on that information? How is an implementation team supposed to know whether their particular version is "good enough" to abide?

"So, there might be a -P option, and if it exists, it will presumably change the behavior of the regexp, but we can't tell you exactly how as that depends on what some 3rd party group decides. Well, two different 3rd party groups, because perl != PCRE and we don't say which. And yeah, an unmatched open-brace matched a literal open brace for 20+ years and then it became an error in some places, and then later more places, but not all, don't worry. Well, except for the other changes in behavior."


The situation with BREs is that the behavior of the extensions is explicitly undefined per XBD 9.3.2 "BRE Ordinary Characters" and applications using them are not POSIX conformant; implementations can make them do whatever they want and there's no statement that they are reserved for the use that any particular tool has assigned.

That's actually exactly the situation currently for the -P option to sed or grep: the behavior of sed or grep with that option is not defined by the standard and any application using that with them is not POSIX conformant; implementations can (and do!) make use of that to do what you desire.



The goal of POSIX is "to support applications portability". How does 'reserving' -P for behavior that is not specified and *expected* to change support application portability? Do you not trust everyone involved to say "no, hell no" if POSIX contributors were to lose touch with the deployed environment and randomly specify sed/grep -P for some completely unrelated purpose?
(0005596)
calestyo (reporter)
2022-01-11 05:48

One could of course define a '-P' to refer to exactly the behaviour as of Perl version x.y.z or some PCRE version, but even if one would do that, implementations would likely just continue to link e.g. libpcre (in it's respectively available version).

One could also argue, that hopefully neither Perl nor PCRE, changes their RE languages in a way that would be backwards incompatible, or at least not in places where this would have any big effect.


Take Web standards... sure you can refer to some specific HTML or DOM version, but what's done in the "real world" is (unfortunately) often far beyond that.


Isn't that a bit nitpicking if one says it's okay for an implementation to change explicitly undefined behaviour to it's needs (and still being compliant) while not being allowed to use undefined options in order to be compliant?

In both cases the problem is, that POSIX couldn't really change it anymore, without either no one adhering to it or breaking things.
It couldn't give "\+" in BRE's some other meaning (because it kinda "promised" no to do this), nor could it use GNU grep's -P for something completely else without likely being ignored.

So its fate is to be frozen, whenever something was already used in some way by some major implementation.
At best, it could just follow what's already done by some big player, hopefully not breaking anything else.


And that's where at least in my personal opinion it's better for the sake of portability to "reserve" something for a certain purpose, to make at least sure that no one uses e.g. -P for something completely different.
Cause that would break portability for sure, while if one says something like "if a -P option is supported, than Perl-like regular expressions shall be used", chances are in reality still quite good, that thing will just work.

Even if one says explicitly that "-P, if supported, shall cause Perl 5.32.1 REs to be use", and applications wouldn't just adhere to that and simply use the current PCRE version, it would be IMO still better to have that, than one of them choosing to use -P for (P)printing debug information.


Anyway,... my main point here was -E, and I have no strong opinion on whether or not -P should be handled in some way by POSIX... I just wanted to throw the idea into the room for discussion.
So from my side we can simply close the ticket.

- Issue History
Date Modified Username Field Change
2022-01-08 03:22 calestyo New Issue
2022-01-08 03:22 calestyo Name => Christoph Anton Mitterer
2022-01-08 03:22 calestyo Section => sed utility
2022-01-08 03:22 calestyo Page Number => N/A
2022-01-08 03:22 calestyo Line Number => NA/
2022-01-10 09:19 geoffclare Note Added: 0005583
2022-01-10 09:19 geoffclare Relationship added duplicate of 0000528
2022-01-10 15:19 calestyo Note Added: 0005587
2022-01-10 15:56 geoffclare Note Added: 0005588
2022-01-10 21:34 stephane Note Added: 0005589
2022-01-10 22:22 calestyo Note Added: 0005590
2022-01-10 23:47 philip-guenther Note Added: 0005591
2022-01-11 00:24 calestyo Note Added: 0005593
2022-01-11 04:19 philip-guenther Note Added: 0005595
2022-01-11 05:48 calestyo Note Added: 0005596


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker