Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001222 [1003.1(2016)/Issue7+TC2] Shell and Utilities Objection Enhancement Request 2018-12-27 23:49 2019-04-30 14:34
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Resolved  
Name Stephane Chazelas
Organization
User Reference
Section echo utility
Page Number echo
Line Number echo
Interp Status ---
Final Accepted Text Note: 0004375
Summary 0001222: "echo" specification doesn't reflect current implementations (missing -e, -E and - handling)
Description (sorry for now adding the page/line numbers, it seems I'm enable to download C181.pdf).

This is a follow-up on the discussion at
http://article.gmane.org/gmane.comp.standards.posix.austin.general/12097 [^]
started by Robert Elz where you'll find more information as well
as at
https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo [^]
and https://www.in-ulm.de/~mascheck/various/echo+printf/ [^]

I know many will say "echo" is a lost cause, POSIX already
recommends to use "printf" instead.

But in practice, "echo" is still widely used, and POSIX "echo"
specification fails to describe the behaviour of current "echo"
implementations, in particular of Free, Libre and Open Source
ones which are now dominant.

For a history recap:

- up until Research Unix V6, echo didn't accepted any option and
  didn't expand any escape sequence so it could be used to
  output arbitrary text followed by a newline character with
  echo "$text"
- in the late 70s, PWB Unix added some \c (causing echo to
  exit which was a way to avoid the trailing newline), \0ooo,
  and \n escape sequences expanded by default.
- that was a very poor design fixed by Dennis Ritchie with the
  addition of -n (in Research Unix V7, 1979) to skip the newline
  and -e (research Unix V8, 1981) for expanding escape sequences
  (also fixing the non-standard \0ooo to \ooo like in C)
- USG Unices kept on adding more escape sequences to their
  (broken) implementation in the 80s.
- portable echo implementations (starting with GNU echo and
  bash's echo builtin in 1992, soon followed by pdksh, zsh),
  added a -E option to disable escape sequence expansion, and
  let the user choose between research Unix and USG behaviour as
  default at build time or run time.
- neither of those could output arbitrary data as there was no
  end-of-option marker. That was fixed by the zsh echo
  implementation in 1990 (initially as -- then as -). echo -E -
  "$text" in zsh now works like in our V6 echo "$text" above.
- POSIX.2 in 1992 left the behaviour unspecified if the first
  argument was "-n" or any argument contained backslashes but
  failed to account for those -e/-E/-
- XPG mandated the behaviour of USG/SysV (the least useful one IMO)
- SUS merged POSIX and XPG as the XSI option, but didn't fix
  those missing parts about -e/-E/-.
- at some point GNU echo added support for --version and --help
  (not when POSIXLY_CORRECT is in the environment, not their
  abbreviations)

Additionally, in those implementations that expand escape
sequences, with the exception of the "echo" builtin of yash,
they don't treat their argument as text.

In those locales where the charset contains characters whose
encoding contains the encoding of backslash (like in BIG5 where
α is 0xA3 0x5C for instance and \ is 0x5C), those characters
will be mangled by echo. For instance, in a locale where BIG5 is
the charset, echo αc outputs a 0xA3 byte instead of αc.

That yash "echo" is the only implementation that I know that is
fully compliant to POSIX+XSI (when ECHO_STYLE=XSI).

All certified Unices are at least non-compliant because of that
encoding issue above. While the "echo" builtin of macOS's sh is
mostly compliant (except for that encoding issue), its
standalone utility understands -n. It understands \c only if \c
is last, but no other escape sequence. I suppose the conformance
tests test "echo" in shell scripts, but not the "echo"
standalone utility like in "env echo".

Today, most people (from trends on
stackoverflow.com/usenet/unix.stackexchange.com) expect "-e" to
be needed to expand escape sequences. So much so that -E is
quite rarely used. Nowadays, except in zsh (where the bsdecho
option is not enabled by default), wherever -E is supported, it
is also the default. The only system that I know that compiles
bash with xpg_echo enabled by default (for -e to be the default)
is macOS for its sh (I suppose it's the case of K-UX as well,
but I've never come across those), but then as that means the
posix option is also enabled, -E is no longer recognised.

echo -e is supported by at least GNU echo, bash, pdksh and
derivatives, zsh, busybox, ksh93, most ash derivatives (dash
being a notable exception) and yash (with some values of
$ECHO_STYLE).

echo -E is supported by at least GNU, bash, pdksh, zsh, busybox
and yash (with some values of $ECHO_STYLE).

On BSDs, -e is generally supported by the "echo" builtin of "sh"
(also -E in OpenBSD/MirOS where sh is based on pdksh), but not
by the standalone "echo" utility.
Desired Action Change the text to:

If the first argument is "-" followed by zero or more e, E or n
characters (or is --version or --help), or if any argument
contains backslash characters (or the byte encoding of the
backslash character), the behaviour is unspecified.

I find it a shame to force XSI systems to implement the broken
(expand escape sequences by default, no way to disable it)
behaviour. I would be of the opinion that since it now deviates
from the de-facto standard, it should be removed to allow
implementations to do the right thing and align with the more
common ones (see how silly it is that macOS decided to use bash
for their sh, but made it incompatible with most other bash
deployments in that regard, just to be UNIX certified)

Maybe also add a future direction that would specify -n/-e/-E à
la bash. (I'd expect zsh's "echo -" to mark the end-of-options
would be contentious, so I suppose echo cannot be fully "fixed").

I suppose the encoding issue can be ignored for now. Anyway, it
also applies to many other utilities including printf, awk,
sed... We can probably assume that those charsets that have
characters whose encoding contains the encoding of other
characters including some in the portable charset will die away
and they can't be used reliably anyway.
Tags issue8
Attached Files

- Relationships
related to 0001206Closed The echo that supports -n is Bell Labs Unix echo, not "BSD echo" 

-  Notes
(0004272)
stephane (reporter)
2019-03-01 10:04

> The only system that I know that compiles
> bash with xpg_echo enabled by default (for -e to be the default)
> is macOS for its sh (I suppose it's the case of K-UX as well,
> but I've never come across those)

As I recently found out, it seems that Solaris 11 (not earlier versions), did also compile bash (/usr/bin/bash, not even a "sh" build!) with xpg_echo enabled by default.
(0004370)
eblake (manager)
2019-04-25 16:17

GNU Coreutils 8.31 recently changed its /bin/echo to be POSIX correct when POSIXLY_CORRECT=1:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/echo.c?id=8d328706c6e [^]
(0004372)
stephane (reporter)
2019-04-25 21:46

Re: Note: 0004370

So, does GNU's POSIXLY_CORRECT now mean POSIX+XSI?

GNU echo was already compliant to the POSIX base (without XSI) before that change (except for the encoding issue).

That change goes in the opposite direction to what I'm requesting here, that echo not expanding escape sequences by default is now becoming the de-facto standard, and that POSIX should consider follow.
(0004373)
geoffclare (manager)
2019-04-26 08:49

Re: Note: 0004372 (the part about "direction"), the consensus in yesterday's teleconference was that we should treat -e and -E the same as -n (implementation-defined), but there is strong opposition to relaxing the XSI requirement as this would break conforming XSI applications.

We also want to maintain the requirement that "echo -" writes a '-' as there is too high a risk of breaking existing POSIX applications if this is made implementation-defined, and support for a special meaning of '-' is not widespread.
(0004375)
geoffclare (manager)
2019-04-29 15:39
edited on: 2019-04-29 15:40

On page 2674 line 87124 section echo OPERANDS, change:
If the first operand is -n, or if any of the operands contain a <backslash> character, the results are implementation-defined.

[XSI]On XSI-conformant systems, if the first operand is -n, it shall be treated as a string, not an option.[/XSI]
to:
If the first operand consists of a '-' followed by one or more characters from the set { 'e', 'E', 'n' }, or if any of the operands contain a <backslash> character, the results are implementation-defined.

[XSI]On XSI-conformant systems, if the first operand consists of a '-' followed by one or more characters from the set { 'e', 'E', 'n' }, it shall be treated as a string to be written.[/XSI]

On page 2675 line 87179 section echo APPLICATION USAGE, change:
It is not possible to use echo portably across all POSIX systems unless both -n (as the first argument) and escape sequences are omitted.
to:
It is not possible to use echo portably across all POSIX systems unless escape sequences are omitted, and the first argument does not consist of a '-' followed by one or more characters from the set { 'e', 'E', 'n' }.

On page 2676 line 87198 section echo RATIONALE, change:
Conforming applications that wish to do prompting without <newline> characters or that could possibly be expecting to echo a −n, should use the printf utility derived from the Ninth Edition system.

As specified, echo writes its arguments in the simplest of ways. The two different historical versions of echo vary in fatally incompatible ways.

The BSD echo checks the first argument for the string −n which causes it to suppress the <newline> that would otherwise follow the final argument in the output.

The System V echo does not support any options, but allows escape sequences within its operands, as described for XSI implementations in the OPERANDS section.

The echo utility does not support Utility Syntax Guideline 10 because historical applications depend on echo to echo all of its arguments, except for the −n option in the BSD version.
to:
Conforming applications that wish to do prompting without <newline> characters or that could possibly be expecting to echo a string consisting of a '-' followed by one or more characters from the set { 'e', 'E', 'n' } should use the printf utility.

At the time that the POSIX.2-1992 standard was being developed, the two different historical versions of echo that were considered for standardization varied in incompatible ways.

The BSD echo checked the first argument for the string −n which caused it to suppress the <newline> that would otherwise follow the final argument in the output.

The System V echo treated all arguments as strings to be written, but allowed escape sequences within them, as described for XSI implementations in the OPERANDS section, including \c to suppress a trailing <newline>.

Thus the POSIX.2-1992 standard said that the behavior was implementation-defined if the first operand is -n or if any of the operands contain a <backslash> character. It also specified that the echo utility does not support Utility Syntax Guideline 10 because historical applications depended on echo to echo all of its arguments, except for the −n first argument in the BSD version.

The Single UNIX Specification, Version 1 required the System V behavior, and this became the XSI requirement when Version 2 and POSIX.2 were merged with POSIX.1 to form the joint POSIX.1-2001 / Single UNIX Specification, Version 3 standard.

This standard now treats a first operand of -e or -E the same as -n in recognition that support for them has become more widespread in non-XSI implementations. Where supported, -e enables processing of escape sequences in the remaining operands (in situations where it is disabled by default), and -E disables it (in situations where it is enabled by default). A first operand containing a combination of these three letters, in the same manner as option grouping, also results in implementation-defined behavior.


(0004377)
stephane (reporter)
2019-04-29 20:11

Re: Note: 0004375

> At the time that the POSIX.2-1992 standard was being developed, the two
> different historical versions of echo that were considered for standardization
> varied in incompatible ways.
>
> The BSD echo checked the first argument for the string −n which caused it to
> suppress the <newline> that would otherwise follow the final argument in the
> output.

In 1992, the current version of BSD would have been BSD4.3 Net/2 which had 3 implementations of echo:

- the standalone echo utility (https://github.com/dspinellis/unix-history-repo/blob/BSD-4_3_Net_2/usr/src/bin/echo/echo.c), [^] that accepted -n but didn't expand any escape sequence.
- the echo builtin of sh (then based on the Almquist shell, https://github.com/dspinellis/unix-history-repo/blob/BSD-4_3_Net_2/usr/src/bin/sh/bltin/echo.c) [^] that accepted -n and expanded sequences with -e. There was actually a similar -e for "read"; with POSIX "read", like for "echo", -e is the default and we need "read -r" to cancel that generally unwanted behaviour ("echo -E" now achieves the same thing for "echo").
- the echo builtin of csh (https://github.com/dspinellis/unix-history-repo/blob/BSD-4_3_Net_2/usr/src/bin/csh/func.c#L805) [^] that had -n, but differed from the standalone echo in that echo without arguments would output nothing instead of a newline character.

So that description of BSD echo is incomplete (especially considering that the POSIX conformance test suite seems to only care for the sh "echo" builtin; it didn't spot that macOS /bin/echo wasn't compliant).

And again -e originated in Version 8 in 1981 over a decade before POSIX, -e didn't become widespread, it was already widespread, so much that it's now a de-facto standard.
(0004378)
stephane (reporter)
2019-04-29 20:19

Re: Note: 0004375

That resolution doesn't seem to address the encoding issue.

BIG5 α (Greek alpha U+03B1) is encoded as 0xA8 0x5C and BIG5 \ (backslash U+005C) is 0x5C.

In

   echo αc

in a locale that uses the BIG5 as charset, do the "operands contain a <backslash> character" (encoding 0x5C) as virtually all echo implementations think it does, or does it not?

Do you want me to raise a separate bug for that?

Could you please state the Austin Group position on that in this bug, even if it doesn't end up in the text of the specification?
(0004379)
geoffclare (manager)
2019-04-30 08:22

Re: Note: 0004377 the new rationale refers to "the two different historical versions of echo that were considered for standardization". The BSD echo that was "considered for standardization" was the one that just checked for -n.

Re: Note: 0004378 The encoding problem is a separate issue. Personally I think it's a bug in the implementations that check for a '\\' byte, instead of a '\\' character as the standard requires. To find out the Austin Group position would require a bug to be submitted and for it to be discussed in a teleconference with at least two ORs present.
(0004380)
stephane (reporter)
2019-04-30 14:34
edited on: 2019-04-30 14:39

Re: Note: 0004379
> the new rationale refers to "the two different historical versions of echo
> that were considered for standardization". The BSD echo that was "considered
> for standardization" was the one that just checked for -n.

But it's calling it "BSD echo". My point is that there's no such thing as *the* BSD echo (or if one had to chose one from the list to be "the BSD echo" from a portable sh script perspective, that would rather be the builtin echo of sh which did already support both -n and -e at the time).

To be accurate, that "BSD echo" should be replaced with something like "the standalone /bin/echo of BSD 4.3 Net/2", or possibly the "echo utility of Research Unix Version 7" which would also be more accurate from a history perspective.

> The encoding problem is a separate issue

Well, the text says "operands contain a <backslash> character". And my question remains: is there something in the spec that says that that BIG5 αc above doesn't contain backslash? Its encoding does contain the encoding of backslash (0x5c is both ASCII backslash and BIG5 backslash, BIG5 is a superset of ASCII).

Is there something in the spec that describes how echo should decode its arguments into characters? It doesn't say echo is a "text utility", does that mean it should be able to cope with arguments that are not text?

The LC_CTYPE section in there is XSI-shaded, so non-XSI echos are not required to interpret the arguments as text in the current locale, so my reading from an application writer point is that I don't know (from reading the standard) if I can pass arguments that contain the encoding of backslash (in any charset) to echo.


- Issue History
Date Modified Username Field Change
2018-12-27 23:49 stephane New Issue
2018-12-27 23:49 stephane Name => Stephane Chazelas
2018-12-27 23:49 stephane Section => echo utility
2018-12-27 23:49 stephane Page Number => echo
2018-12-27 23:49 stephane Line Number => echo
2019-03-01 10:04 stephane Note Added: 0004272
2019-04-01 16:33 nick Relationship added related to 0001206
2019-04-25 16:17 eblake Note Added: 0004370
2019-04-25 21:46 stephane Note Added: 0004372
2019-04-26 08:49 geoffclare Note Added: 0004373
2019-04-29 15:39 geoffclare Note Added: 0004375
2019-04-29 15:40 geoffclare Note Edited: 0004375
2019-04-29 15:41 geoffclare Interp Status => ---
2019-04-29 15:41 geoffclare Final Accepted Text => Note: 0004375
2019-04-29 15:41 geoffclare Status New => Resolved
2019-04-29 15:41 geoffclare Resolution Open => Accepted As Marked
2019-04-29 15:42 geoffclare Tag Attached: issue8
2019-04-29 20:11 stephane Note Added: 0004377
2019-04-29 20:19 stephane Note Added: 0004378
2019-04-30 08:22 geoffclare Note Added: 0004379
2019-04-30 14:34 stephane Note Added: 0004380
2019-04-30 14:37 stephane Note Edited: 0004380
2019-04-30 14:39 stephane Note Edited: 0004380


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker