Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001038 [1003.1(2013)/Issue7+TC1] Shell and Utilities Objection Error 2016-03-24 18:24 2019-10-23 10:14
Reporter kre View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Robert Elz
Organization
User Reference
Section 2.5.3, 2.6
Page Number 2352
Line Number 74952
Interp Status Approved
Final Accepted Text See Note: 0003616
Summary 0001038: Specification relies upon unspecified behaviour
Description In section 2.5.3 (the list of shell variables) for the variable "PS1" it is
stated (inter alia) ...

    Each time an interactive shell is ready to read a command, the value of this
    variable shall be subjected to parameter expansion and written to standard
    error. The default value shall be "$ ".

In section 2.6 (Word expansions, which includes the parameter expansion
required of PS1) it is said ...

    The '$' character is used to introduce parameter expansion, command
    substitution, or arithmetic evaluation. If an unquoted '$' is followed
    by a character that is not one of the following:

followed by a list of 5 bullet items, none of which is space.

    the result is unspecified.

Hence, in evaluating for parameter expansion, PS1 when set to the (normal
user) default value, the shell must encounter a situation which is explicitly
unspecified.
Desired Action Best would be to add to section 2.6 a sixth bullet item...

     white space or end of input, in which case the '$' is literal

An alternative would be to change the default value of PS1 in 2.5.3 to "\$ "

(I think I recall being told once that the list of valid characters after $
also appears elsewhere, if so, and if the first solution is adopted, it would
need to be repeated in any other place that the same list is given - I cannot
find where that would be at the minute.)
Tags tc3-2008
Attached Files

- Relationships
related to 0001006Applied PS1 should be subject to command substitution and arithmetic expansion 

-  Notes
(0003589)
shware_systems (reporter)
2017-03-03 14:42

As an fyi, in the 2017-03-02 phone call this was examined and the consensus was XCU 2.3 was already explicit enough about how the Desired Action was required to be handled, in that for this circumstance there is no character following the '$' >in the current token< for the list in 2.6 to apply to, so it gets treated as a literal $ would anyways. It was agreed as stated this wasn't obvious enough, and pointed out additional potential ambiguities, so clarifying text for these is being drafted.
(0003590)
eblake (manager)
2017-03-03 15:27

In particular, the text was ambiguous as to whether the default PS1, PS2, and PS4 are applied at all places where the user unsets the variable, or only at shell startup (the latter matches existing practice - unsetting a PS? variable results in no further prompting, so the default is a startup-only operation).

Also, bash currently does NOT set PS1 to something that expands to a mere '$ ', but instead sets it to something that expands to the basename of $0, a dash, a version string, and then the '$ '. It is unclear whether Chet Ramey is willing to fix this as a bug in 'bash --posix' mode, or whether the standard should be relaxed to permit bash behavior (by changing the wording to require that the prompt merely end in a literal '$ ', not that it is '$ ' in entirety).

Another bash behavior brought up is that bash changes PS4 in subshells, by repeating the first character as an indication of shell depth, and the standard does not permit that. Again, Chet's opinion on whether it is something he is willing to fix for 'bash --posix', or whether the standard should be relaxed to allow it as existing behavior, would be worthwhile.
(0003591)
kre (reporter)
2017-03-03 15:50

First, wrt note 3590, that might all be correct, but really has nothing to
do with the issue raised.

The subject of this bug report was just to illustrate the problem (that PS1
is actually defined to be '$ ' and is then subject to expansions before use)
which means that a space following a '$' had better not be an undefined
sequence. Yet that is what 2.6 currently makes it.

wrt note 3589, I'm afraid I don't see how section 2.3 helps in general.
$ expansion is performed inside "" strings, what is the expected behaviour
of expressions like
    x="$ $ $"
and where is that defined? The quoted string in a single token, subject
to parameter expansion (no-one questions what happens with x="$#" for example.)

I am glad however that there is agreement that it needs clarifying, I don't
really believe that anyone doubts what should happen here - what all shells
have done forever.

kre
(0003594)
geoffclare (manager)
2017-03-03 16:10

In addition to x="$ $ $" that kre raises, there is also:

grep -E "$wordend([[:space:]]|$)"

to consider.
(0003595)
shware_systems (reporter)
2017-03-03 16:41

Re:
"$ expansion is performed inside "" strings, what is the expected behaviour
of expressions like
    x="$ $ $"
and where is that defined?"

is part of XCU 2.3 too. Section 5. applies when the sequences match what is expected to initiate an expansion subcontext, and "$ $ $" does not match so is treated as literal text by applying Section 4. If this was the default for PS1, it gets stored as if it was outside double quotes, due to quote removal, so evaluating it as that makes the individual '$' signs tokens of their own as literal text, which is how I described it.

The ambiguity remaining, in terms of tokenizing the value, is that <blank> consolidation of the PS1 value is expected to leave at least one leading or trailing <blank> if these are part of the value and not trimmed off entirely, to match the behavior of existing implementations, or <blank> consolidation is not supposed to occur at all. The tentative resolution to this is requiring the expansions be evaluated as if they are enclosed inside double quotes, so consolidation doesn't happen.
(0003597)
chet_ramey (reporter)
2017-03-03 16:46

The question in note 3590, whether bash is permitted to make the default $PS1 something other than "$ ", doesn't seem to have much, if anything, to do with the "undefined behavior" under discussion.

However, I think it would be worthwhile for Posix to tweak the wording to specify that a string ending in "$ " is permissible. That would still capture existing behavior of all shells.
(0003598)
Don Cragun (manager)
2017-03-03 17:15
edited on: 2017-03-03 17:16

I would prefer to have the POSIX shell for a user running with "normal" privileges to behave as if PS1 had been defined with PS1='\$ ' unless the startup environment defined it to be something else or a shell startup script defined it to be something else. Just saying that the prompt ends with "$ " (as suggested in Note: 0003597) would allow a shell to set the default prompt to a 200 character string (and I definitely don't want that).

(0003599)
chet_ramey (reporter)
2017-03-03 18:58

re: note 3598

A 200-character string is highly unlikely, but what would it matter if a shell did that?
(0003600)
eblake (manager)
2017-03-03 19:03

I can make bash's prompt as long as I want (although the pain is self-inflicted, since the basename of /bin/sh is only length 2):

$ ln -s /bin/bash really_loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_name
$ env -u PS1 ./really_loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_name --posix --norc
really_loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_name-4.3$
(0003601)
chet_ramey (reporter)
2017-03-03 19:54

re: 3600

OK. What does it matter? What interest does the standard have in keeping the primary prompt short?
(0003602)
shware_systems (reporter)
2017-03-03 19:57
edited on: 2017-03-03 20:00

On terminals that do not line wrap, echoing of user input on the same line would frequently go to /dev/nul/ instead of at least the first part being visible to the operator, with lines that long. A similar possibility exists with bash's PS4 prompts, if some script uses 100s of trace levels for a recursive algorithm. This is the primary concern here. Scripts that format output to adapt to terminals like this need a fixed width prompt to subtract from the terminal's line width setting to maintain alignments. The standard can't guard against any data ever being lost with terminals like this, but I agree with Don, the implementations shouldn't make it more likely to occur as their default behaviors. These variable length prompts are fine as a bash extension, or as default behavior for a user with extended privileges who has rtfm and is aware bash does this, but are non-conforming to what non-privileged users of sh who may only read the sh man page are told to expect of script and interactive behavior.

(0003604)
kre (reporter)
2017-03-03 23:50
edited on: 2017-03-04 00:37

Can we all please move the discussion of what the default prompt (PS1) should be
to some other issue (with one exception I will note in a second) - that is *really*
not relevant here...

The one exception is Don's observation (note 3598) that it could be defined to
be '\$ '. Doing that would certainly solve the issue in the original report
here - but it would not solve the real underlying problem, and it prevents people
from explicitly doing
       PS1='$ '
which has been allowed, in all derived shells I have ever seen, since the
original Bourne shell (1977?)

It also does nothing to deal with strings that contain "$ " which has
also worked forever.

WRT note 3595: I believe you are mistaken, when performing parameter expansion
of a quoted string, the string is not tokenised first (section 2.3 is not
applied). If it were then

       "$ $ $ $" (that should appear with 1, then 3, then 5 embedded spaces, but does not...)
would be identical to
       "$ $ $ $"
as runs of spaces are eliminated by the tokenising procedure, leaving just
token delimiters (which then get converted back to a space if required.)

I'm not sure why you want to go to so much trouble to attempt to get out
of simply defining that white space (including \n) or end of word, after a
$ should mean the $ is to be treated literally - which is what everyone does.

WRT Geoff's issue in note 3594, I am actually not sure I would attempt
to fix the case: "$wordend([[:space:]]|$)" - it might be better to leave
it unspecified (though one could also make it explicit that the two string
pairs '$)' and '$|' are to be treated literally, and not be undefined,
for the benefit of regular-expression entry - that is, one could add a
sixth bullet point to 2.6 listing the characters which, when encountered
after a '$', cause the '$' to be treated literally, rather than be unspecified.

I have not done the testing really required, but I suspect that in all shells,
the "unspecified behaviour" that section 2.6 states for $ not followed by a
character one of the 5 defined groups, is always to treat the $ literally, and
just proceed with the next character.

However, I would be hesitant to alter that in general, despite it being
more or less universal, as it would then prohibit extensions like $'...'
as it would be required that "$'" expand to the two characters '$' and "'".

That is actually what the NetBSD sh (which does not, yet anyway, implement the
proposed $'...' syntax) does - if the results were not unspecified, then
users would be able to demand that $' remain a literal 2 character string.
As it is, however, anyone using that is entering the world of unspecified
behaviour, and anything is allowed - which permits shells to go ahead and
implement $'...' - which then allows this group to standardise that if it
seems to have become popular enough. To get a literal "$'" pair, an
application needs to write it as "\$'"

It could be considered that to avoid to many restrictions, Geoff's example
should really be written
     grep -E "$wordend([[:space:]]|\$)"
or perhaps
     grep -E "$wordend"'([[:space:]]|$)'
despite the inconvenience that would cause (and probably the number of
scripts it would break if required, as far as strict conformance goes.)

I don't think that is a reasonable requirement for "$ " though, precisely
because of the default setting of PS1 - both as it is defined in the standard
now, and how it has always been. Shells simply cannot define the sequence
'$ ' as being anything other than the 2 characters written because of that,
so I don't think simply avoiding the whole issue by changing PS1's default
to be '\$ ' is the right way to make this issue go away.

kre

(0003605)
shware_systems (reporter)
2017-03-04 05:19

You're right, inside "" strings $ is not tokenized first. What I meant is a $ needs to be recognized, not tokenized, as possibly starting an expansion, by Rule 5, and if doesn't then Rule 4 continues to apply to find the closing double-quote, not blank compaction which is in another rule. That applies after the closing quote is found, not before, of the ASSIGNMENT_WORD token as a whole being recognized.

When that is then evaluated fully quote removal occurs, so the value stored will have the extra spaces, but not the double quotes.

If the stored value of PS1 is evaluated same as if it was a line of script, for output, no quotes are present so the extra blanks get deleted.
If it is evaluated as if it was surrounded by double quotes again, the extra blanks stay.
Phrased as "the value of this variable shall be subjected to parameter expansion" more implies the first evaluation type, yet the usual behavior observed is more in line with the second so we're making that explicit.
Other ways of describing the observed behavior are possible too, so this is still tentative, but are more complex.
(0003606)
kre (reporter)
2017-03-04 07:08

Just forget about section 2.3 and the rules there, they're for token recognition
and have nothing whatever to do with parameter expansion.

Parameter expansion (and other expansions, but they're not relevant here) is
done in many contexts (I mean for many reasons, I'm not talking of execution
contexts or whatever). In each case you simply have a string of characters
(in the case of the default PS1 that string contains 2 characters, a dollar, and
a space). For parameter expansion, what you do is find the next unquoted $,
- which is the first of the two characters here - then look at the character
that comes next, to decide what kind of parameter expansion is required (as
specified in the section 2.6 preamble). Then follow the rules of 2.6.x
(whichever sub-section applies) to actually perform the expansion.

The problem here is that the next character is a space, which 2.6 says gives
unspecified behaviour - applications are not supposed to do that - but
here, it is not the application that did, it is the std, which is the problem.

All we need to do to fix it is specify it. It really is simple.
(It could also be fixed by quoting the $ in PS1, but that's an inferior
solution.)

kre
(0003607)
kre (reporter)
2017-03-04 07:34
edited on: 2017-03-04 07:37

In addition if you still believe that the rules in 2.3 are somehow relevant
when performing word expansions (on text that has already been tokenised, or
never will be) consider the (not specified by the standard, but everyone does
it) case of PS1='# ' and imagine what would happen to that when rule 9 of
section 2.3 is applied...

I have no idea why section 2.6 when giving the order of the expansions, when
many are to apply, refers back to rule 5 of section 2.3 in case 1 - except
that I guess it says how to find the full set of chars that make up the
expansion (how to match the closing ')' in a '$(' command expansion) - but
there obviously cannot be any intent that section 2.3 be used in general
(that would simply be wrong.)

Section 2.6 as a whole really needs major work - it is written as if the
only time expansions get done is when processing the args of a command line.
There ought to be one section for that, and then separately, sections for
each of the types of expansions which specify how they are done, when they
apply (which includes PSn processing, here doc processing, and more I expect,
which have nothing whatever to do with command lines.) But all that work is
more than is needed right now.

kre

(0003613)
kre (reporter)
2017-03-08 22:49

One more thought on this issue ... it might also be worth specifying what
happens when a '$' is followed by a defined character (one which would cause
an expansion normally) when evaluated in a context where that expansion is
not performed.

So, for example, if a shell does not do command substitution when expanding
PS1 (which is acceptable with the proposed resolution of issue 1006 I believe)
then how is the setting
      PS1='$(date) ... '
to be interpreted? Note that the new text from 1006 says that whether or
not the command expansion happens is unspecified, it does not say anything
about the effect on the prompt if the command expansion does not happen.

If there are other contexts in which only some of the word expansions take
place, the same issue would arise.

kre
(0003616)
Don Cragun (manager)
2017-03-16 15:57
edited on: 2018-10-11 15:11

This note was updated during the 2018-10-11 conference call to accept the modification suggested in Note: 0003629.

Interpretation response
------------------------
The standard states that "$ " is the
default PS1 prompt value and states that the behavior with this default
prompt is unspecfied. However, concerns have been raised about this
which are being referred to the sponsor.

Rationale:
-------------
This is not the way traditional and current shells behave and does
not accurately describe the way shells should behave.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
On page 2353 line 75012-75019 (2016 edition page and line numbers)
XCU section 2.6 (Word Expansions) change:
If an unquoted '$' is followed by a character that is not one of the
following:

[...]

the result is unspecified.

to:
If a '$' that is neither within single-quotes nor escaped by a <backslash>
is immediately followed by a character that is not a <space>, not a <tab>,
not a <newline>, and is not one of the following:

[...]

the result is unspecified. If a '$' that is neither within single-quotes nor
escaped by a <backslash> is immediately followed by a <space>, <tab>,
or a <newline>, or is not followed by any character, the '$' shall be treated as a literal character.


(0003629)
wpollock (reporter)
2017-03-17 11:46
edited on: 2017-03-17 11:47

The proposed wording in note 3616 doesn't address what happens if a
dollar-sign is the last character (and thus not followed by anything).
I think that should cause the dollar-sign to be taken as a literal.
For example:

   $ sh -c 'echo $'

(I think this could also occur with "here strings", if that is adopted
in the future.)

Robert ("kre") suggested this modification to the proposed wording of
note 3616 on the mailing list:

<quote>
   If a '$' that is neither within single-quotes nor escaped by a <backslash>
   is immediately followed by a character that is not a <space>, not a <tab>,
   not a <newline>, and is not one of the following:

   [...]

   the result is unspecified. If a '$' that is neither within single-quotes nor
   escaped by a <backslash> is immediately followed by a <space>, <tab>,
   or a <newline>, or is not followed by any character, the '$' shall be
   treated as a literal character.
</quote>

(0004146)
ajosey (manager)
2018-10-11 15:18

Interpretation Proposed: 11 Oct 2018
(0004166)
ajosey (manager)
2018-11-12 19:48

Interpretation approved: 12 November 2018

- Issue History
Date Modified Username Field Change
2016-03-24 18:24 kre New Issue
2016-03-24 18:24 kre Name => Robert Elz
2016-03-24 18:24 kre Section => 2.5.3, 2.6
2016-03-24 18:24 kre Page Number => unknown
2016-03-24 18:24 kre Line Number => unknown
2016-08-11 15:29 Don Cragun Relationship added related to 0001006
2017-03-02 16:22 nick Page Number unknown => 2352
2017-03-02 16:22 nick Line Number unknown => 74952
2017-03-02 16:22 nick Interp Status => ---
2017-03-03 14:42 shware_systems Note Added: 0003589
2017-03-03 15:27 eblake Note Added: 0003590
2017-03-03 15:50 kre Note Added: 0003591
2017-03-03 15:59 joerg Note Added: 0003593
2017-03-03 16:10 geoffclare Note Added: 0003594
2017-03-03 16:41 shware_systems Note Added: 0003595
2017-03-03 16:44 joerg Note Deleted: 0003593
2017-03-03 16:46 chet_ramey Note Added: 0003597
2017-03-03 17:15 Don Cragun Note Added: 0003598
2017-03-03 17:16 Don Cragun Note Edited: 0003598
2017-03-03 18:58 chet_ramey Note Added: 0003599
2017-03-03 19:03 eblake Note Added: 0003600
2017-03-03 19:54 chet_ramey Note Added: 0003601
2017-03-03 19:57 shware_systems Note Added: 0003602
2017-03-03 20:00 shware_systems Note Edited: 0003602
2017-03-03 23:50 kre Note Added: 0003604
2017-03-03 23:52 kre Note Edited: 0003604
2017-03-03 23:55 kre Note Edited: 0003604
2017-03-04 00:37 kre Note Edited: 0003604
2017-03-04 05:19 shware_systems Note Added: 0003605
2017-03-04 07:08 kre Note Added: 0003606
2017-03-04 07:34 kre Note Added: 0003607
2017-03-04 07:36 kre Note Edited: 0003607
2017-03-04 07:37 kre Note Edited: 0003607
2017-03-08 22:49 kre Note Added: 0003613
2017-03-16 15:57 Don Cragun Note Added: 0003616
2017-03-16 15:58 Don Cragun Interp Status --- => Pending
2017-03-16 15:58 Don Cragun Final Accepted Text => See Note: 0003616
2017-03-16 15:58 Don Cragun Status New => Interpretation Required
2017-03-16 15:58 Don Cragun Resolution Open => Accepted As Marked
2017-03-16 15:58 Don Cragun Tag Attached: tc3-2008
2017-03-17 11:46 wpollock Note Added: 0003629
2017-03-17 11:47 wpollock Note Edited: 0003629
2018-10-11 15:11 Don Cragun Note Edited: 0003616
2018-10-11 15:18 ajosey Interp Status Pending => Proposed
2018-10-11 15:18 ajosey Note Added: 0004146
2018-11-12 19:48 ajosey Interp Status Proposed => Approved
2018-11-12 19:48 ajosey Note Added: 0004166
2019-10-23 10:14 geoffclare Status Interpretation Required => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker