Anonymous | Login | 2024-12-02 08:31 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001222 | [1003.1(2016/18)/Issue7+TC2] Shell and Utilities | Objection | Enhancement Request | 2018-12-27 23:49 | 2024-06-11 09:08 | ||
Reporter | stephane | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Stephane Chazelas | ||||||
Organization | |||||||
User Reference | |||||||
Section | echo utility | ||||||
Page Number | echo | ||||||
Line Number | echo | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0004375 | ||||||
Summary | 0001222: "echo" specification doesn't reflect current implementations (missing -e, -E and - handling) | ||||||
Description |
(sorry for now adding the page/line numbers, it seems I'm enable to download C181.pdf). This is a follow-up on the discussion at http://article.gmane.org/gmane.comp.standards.posix.austin.general/12097 [^] started by Robert Elz where you'll find more information as well as at https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo [^] and https://www.in-ulm.de/~mascheck/various/echo+printf/ [^] I know many will say "echo" is a lost cause, POSIX already recommends to use "printf" instead. But in practice, "echo" is still widely used, and POSIX "echo" specification fails to describe the behaviour of current "echo" implementations, in particular of Free, Libre and Open Source ones which are now dominant. For a history recap: - up until Research Unix V6, echo didn't accepted any option and didn't expand any escape sequence so it could be used to output arbitrary text followed by a newline character with echo "$text" - in the late 70s, PWB Unix added some \c (causing echo to exit which was a way to avoid the trailing newline), \0ooo, and \n escape sequences expanded by default. - that was a very poor design fixed by Dennis Ritchie with the addition of -n (in Research Unix V7, 1979) to skip the newline and -e (research Unix V8, 1981) for expanding escape sequences (also fixing the non-standard \0ooo to \ooo like in C) - USG Unices kept on adding more escape sequences to their (broken) implementation in the 80s. - portable echo implementations (starting with GNU echo and bash's echo builtin in 1992, soon followed by pdksh, zsh), added a -E option to disable escape sequence expansion, and let the user choose between research Unix and USG behaviour as default at build time or run time. - neither of those could output arbitrary data as there was no end-of-option marker. That was fixed by the zsh echo implementation in 1990 (initially as -- then as -). echo -E - "$text" in zsh now works like in our V6 echo "$text" above. - POSIX.2 in 1992 left the behaviour unspecified if the first argument was "-n" or any argument contained backslashes but failed to account for those -e/-E/- - XPG mandated the behaviour of USG/SysV (the least useful one IMO) - SUS merged POSIX and XPG as the XSI option, but didn't fix those missing parts about -e/-E/-. - at some point GNU echo added support for --version and --help (not when POSIXLY_CORRECT is in the environment, not their abbreviations) Additionally, in those implementations that expand escape sequences, with the exception of the "echo" builtin of yash, they don't treat their argument as text. In those locales where the charset contains characters whose encoding contains the encoding of backslash (like in BIG5 where α is 0xA3 0x5C for instance and \ is 0x5C), those characters will be mangled by echo. For instance, in a locale where BIG5 is the charset, echo αc outputs a 0xA3 byte instead of αc. That yash "echo" is the only implementation that I know that is fully compliant to POSIX+XSI (when ECHO_STYLE=XSI). All certified Unices are at least non-compliant because of that encoding issue above. While the "echo" builtin of macOS's sh is mostly compliant (except for that encoding issue), its standalone utility understands -n. It understands \c only if \c is last, but no other escape sequence. I suppose the conformance tests test "echo" in shell scripts, but not the "echo" standalone utility like in "env echo". Today, most people (from trends on stackoverflow.com/usenet/unix.stackexchange.com) expect "-e" to be needed to expand escape sequences. So much so that -E is quite rarely used. Nowadays, except in zsh (where the bsdecho option is not enabled by default), wherever -E is supported, it is also the default. The only system that I know that compiles bash with xpg_echo enabled by default (for -e to be the default) is macOS for its sh (I suppose it's the case of K-UX as well, but I've never come across those), but then as that means the posix option is also enabled, -E is no longer recognised. echo -e is supported by at least GNU echo, bash, pdksh and derivatives, zsh, busybox, ksh93, most ash derivatives (dash being a notable exception) and yash (with some values of $ECHO_STYLE). echo -E is supported by at least GNU, bash, pdksh, zsh, busybox and yash (with some values of $ECHO_STYLE). On BSDs, -e is generally supported by the "echo" builtin of "sh" (also -E in OpenBSD/MirOS where sh is based on pdksh), but not by the standalone "echo" utility. |
||||||
Desired Action |
Change the text to: If the first argument is "-" followed by zero or more e, E or n characters (or is --version or --help), or if any argument contains backslash characters (or the byte encoding of the backslash character), the behaviour is unspecified. I find it a shame to force XSI systems to implement the broken (expand escape sequences by default, no way to disable it) behaviour. I would be of the opinion that since it now deviates from the de-facto standard, it should be removed to allow implementations to do the right thing and align with the more common ones (see how silly it is that macOS decided to use bash for their sh, but made it incompatible with most other bash deployments in that regard, just to be UNIX certified) Maybe also add a future direction that would specify -n/-e/-E à la bash. (I'd expect zsh's "echo -" to mark the end-of-options would be contentious, so I suppose echo cannot be fully "fixed"). I suppose the encoding issue can be ignored for now. Anyway, it also applies to many other utilities including printf, awk, sed... We can probably assume that those charsets that have characters whose encoding contains the encoding of other characters including some in the portable charset will die away and they can't be used reliably anyway. |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
Relationships | ||||||
|
Notes | |
(0004272) stephane (reporter) 2019-03-01 10:04 |
> The only system that I know that compiles > bash with xpg_echo enabled by default (for -e to be the default) > is macOS for its sh (I suppose it's the case of K-UX as well, > but I've never come across those) As I recently found out, it seems that Solaris 11 (not earlier versions), did also compile bash (/usr/bin/bash, not even a "sh" build!) with xpg_echo enabled by default. |
(0004370) eblake (manager) 2019-04-25 16:17 |
GNU Coreutils 8.31 recently changed its /bin/echo to be POSIX correct when POSIXLY_CORRECT=1: https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/echo.c?id=8d328706c6e [^] |
(0004372) stephane (reporter) 2019-04-25 21:46 |
Re: Note: 0004370 So, does GNU's POSIXLY_CORRECT now mean POSIX+XSI? GNU echo was already compliant to the POSIX base (without XSI) before that change (except for the encoding issue). That change goes in the opposite direction to what I'm requesting here, that echo not expanding escape sequences by default is now becoming the de-facto standard, and that POSIX should consider follow. |
(0004373) geoffclare (manager) 2019-04-26 08:49 |
Re: Note: 0004372 (the part about "direction"), the consensus in yesterday's teleconference was that we should treat -e and -E the same as -n (implementation-defined), but there is strong opposition to relaxing the XSI requirement as this would break conforming XSI applications. We also want to maintain the requirement that "echo -" writes a '-' as there is too high a risk of breaking existing POSIX applications if this is made implementation-defined, and support for a special meaning of '-' is not widespread. |
(0004375) geoffclare (manager) 2019-04-29 15:39 edited on: 2019-04-29 15:40 |
On page 2674 line 87124 section echo OPERANDS, change:If the first operand is -n, or if any of the operands contain a <backslash> character, the results are implementation-defined.to: If the first operand consists of a '-' followed by one or more characters from the set { 'e', 'E', 'n' }, or if any of the operands contain a <backslash> character, the results are implementation-defined. On page 2675 line 87179 section echo APPLICATION USAGE, change: It is not possible to use echo portably across all POSIX systems unless both -n (as the first argument) and escape sequences are omitted.to: It is not possible to use echo portably across all POSIX systems unless escape sequences are omitted, and the first argument does not consist of a '-' followed by one or more characters from the set { 'e', 'E', 'n' }. On page 2676 line 87198 section echo RATIONALE, change: Conforming applications that wish to do prompting without <newline> characters or that could possibly be expecting to echo a −n, should use the printf utility derived from the Ninth Edition system.to: Conforming applications that wish to do prompting without <newline> characters or that could possibly be expecting to echo a string consisting of a '-' followed by one or more characters from the set { 'e', 'E', 'n' } should use the printf utility. |
(0004377) stephane (reporter) 2019-04-29 20:11 |
Re: Note: 0004375 > At the time that the POSIX.2-1992 standard was being developed, the two > different historical versions of echo that were considered for standardization > varied in incompatible ways. > > The BSD echo checked the first argument for the string −n which caused it to > suppress the <newline> that would otherwise follow the final argument in the > output. In 1992, the current version of BSD would have been BSD4.3 Net/2 which had 3 implementations of echo: - the standalone echo utility (https://github.com/dspinellis/unix-history-repo/blob/BSD-4_3_Net_2/usr/src/bin/echo/echo.c), [^] that accepted -n but didn't expand any escape sequence. - the echo builtin of sh (then based on the Almquist shell, https://github.com/dspinellis/unix-history-repo/blob/BSD-4_3_Net_2/usr/src/bin/sh/bltin/echo.c) [^] that accepted -n and expanded sequences with -e. There was actually a similar -e for "read"; with POSIX "read", like for "echo", -e is the default and we need "read -r" to cancel that generally unwanted behaviour ("echo -E" now achieves the same thing for "echo"). - the echo builtin of csh (https://github.com/dspinellis/unix-history-repo/blob/BSD-4_3_Net_2/usr/src/bin/csh/func.c#L805) [^] that had -n, but differed from the standalone echo in that echo without arguments would output nothing instead of a newline character. So that description of BSD echo is incomplete (especially considering that the POSIX conformance test suite seems to only care for the sh "echo" builtin; it didn't spot that macOS /bin/echo wasn't compliant). And again -e originated in Version 8 in 1981 over a decade before POSIX, -e didn't become widespread, it was already widespread, so much that it's now a de-facto standard. |
(0004378) stephane (reporter) 2019-04-29 20:19 |
Re: Note: 0004375 That resolution doesn't seem to address the encoding issue. BIG5 α (Greek alpha U+03B1) is encoded as 0xA8 0x5C and BIG5 \ (backslash U+005C) is 0x5C. In echo αc in a locale that uses the BIG5 as charset, do the "operands contain a <backslash> character" (encoding 0x5C) as virtually all echo implementations think it does, or does it not? Do you want me to raise a separate bug for that? Could you please state the Austin Group position on that in this bug, even if it doesn't end up in the text of the specification? |
(0004379) geoffclare (manager) 2019-04-30 08:22 |
Re: Note: 0004377 the new rationale refers to "the two different historical versions of echo that were considered for standardization". The BSD echo that was "considered for standardization" was the one that just checked for -n. Re: Note: 0004378 The encoding problem is a separate issue. Personally I think it's a bug in the implementations that check for a '\\' byte, instead of a '\\' character as the standard requires. To find out the Austin Group position would require a bug to be submitted and for it to be discussed in a teleconference with at least two ORs present. |
(0004380) stephane (reporter) 2019-04-30 14:34 edited on: 2019-04-30 14:39 |
Re: Note: 0004379 > the new rationale refers to "the two different historical versions of echo > that were considered for standardization". The BSD echo that was "considered > for standardization" was the one that just checked for -n. But it's calling it "BSD echo". My point is that there's no such thing as *the* BSD echo (or if one had to chose one from the list to be "the BSD echo" from a portable sh script perspective, that would rather be the builtin echo of sh which did already support both -n and -e at the time). To be accurate, that "BSD echo" should be replaced with something like "the standalone /bin/echo of BSD 4.3 Net/2", or possibly the "echo utility of Research Unix Version 7" which would also be more accurate from a history perspective. > The encoding problem is a separate issue Well, the text says "operands contain a <backslash> character". And my question remains: is there something in the spec that says that that BIG5 αc above doesn't contain backslash? Its encoding does contain the encoding of backslash (0x5c is both ASCII backslash and BIG5 backslash, BIG5 is a superset of ASCII). Is there something in the spec that describes how echo should decode its arguments into characters? It doesn't say echo is a "text utility", does that mean it should be able to cope with arguments that are not text? The LC_CTYPE section in there is XSI-shaded, so non-XSI echos are not required to interpret the arguments as text in the current locale, so my reading from an application writer point is that I don't know (from reading the standard) if I can pass arguments that contain the encoding of backslash (in any charset) to echo. |
Issue History | |||
Date Modified | Username | Field | Change |
2018-12-27 23:49 | stephane | New Issue | |
2018-12-27 23:49 | stephane | Name | => Stephane Chazelas |
2018-12-27 23:49 | stephane | Section | => echo utility |
2018-12-27 23:49 | stephane | Page Number | => echo |
2018-12-27 23:49 | stephane | Line Number | => echo |
2019-03-01 10:04 | stephane | Note Added: 0004272 | |
2019-04-01 16:33 | nick | Relationship added | related to 0001206 |
2019-04-25 16:17 | eblake | Note Added: 0004370 | |
2019-04-25 21:46 | stephane | Note Added: 0004372 | |
2019-04-26 08:49 | geoffclare | Note Added: 0004373 | |
2019-04-29 15:39 | geoffclare | Note Added: 0004375 | |
2019-04-29 15:40 | geoffclare | Note Edited: 0004375 | |
2019-04-29 15:41 | geoffclare | Interp Status | => --- |
2019-04-29 15:41 | geoffclare | Final Accepted Text | => Note: 0004375 |
2019-04-29 15:41 | geoffclare | Status | New => Resolved |
2019-04-29 15:41 | geoffclare | Resolution | Open => Accepted As Marked |
2019-04-29 15:42 | geoffclare | Tag Attached: issue8 | |
2019-04-29 20:11 | stephane | Note Added: 0004377 | |
2019-04-29 20:19 | stephane | Note Added: 0004378 | |
2019-04-30 08:22 | geoffclare | Note Added: 0004379 | |
2019-04-30 14:34 | stephane | Note Added: 0004380 | |
2019-04-30 14:37 | stephane | Note Edited: 0004380 | |
2019-04-30 14:39 | stephane | Note Edited: 0004380 | |
2020-04-29 15:05 | geoffclare | Status | Resolved => Applied |
2024-06-11 09:08 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |