0001038: Specification relies upon unspecified behaviour

Notes
(0003589) shware_systems (reporter) 2017-03-03 14:42	As an fyi, in the 2017-03-02 phone call this was examined and the consensus was XCU 2.3 was already explicit enough about how the Desired Action was required to be handled, in that for this circumstance there is no character following the '$' >in the current token< for the list in 2.6 to apply to, so it gets treated as a literal $ would anyways. It was agreed as stated this wasn't obvious enough, and pointed out additional potential ambiguities, so clarifying text for these is being drafted.

(0003590) eblake (manager) 2017-03-03 15:27	In particular, the text was ambiguous as to whether the default PS1, PS2, and PS4 are applied at all places where the user unsets the variable, or only at shell startup (the latter matches existing practice - unsetting a PS? variable results in no further prompting, so the default is a startup-only operation). Also, bash currently does NOT set PS1 to something that expands to a mere '$ ', but instead sets it to something that expands to the basename of $0, a dash, a version string, and then the '$ '. It is unclear whether Chet Ramey is willing to fix this as a bug in 'bash --posix' mode, or whether the standard should be relaxed to permit bash behavior (by changing the wording to require that the prompt merely end in a literal '$ ', not that it is '$ ' in entirety). Another bash behavior brought up is that bash changes PS4 in subshells, by repeating the first character as an indication of shell depth, and the standard does not permit that. Again, Chet's opinion on whether it is something he is willing to fix for 'bash --posix', or whether the standard should be relaxed to allow it as existing behavior, would be worthwhile.

(0003591) kre (reporter) 2017-03-03 15:50	First, wrt note 3590, that might all be correct, but really has nothing to do with the issue raised. The subject of this bug report was just to illustrate the problem (that PS1 is actually defined to be '$ ' and is then subject to expansions before use) which means that a space following a '$' had better not be an undefined sequence. Yet that is what 2.6 currently makes it. wrt note 3589, I'm afraid I don't see how section 2.3 helps in general. $ expansion is performed inside "" strings, what is the expected behaviour of expressions like x="$ $ $" and where is that defined? The quoted string in a single token, subject to parameter expansion (no-one questions what happens with x="$#" for example.) I am glad however that there is agreement that it needs clarifying, I don't really believe that anyone doubts what should happen here - what all shells have done forever. kre

(0003594) geoffclare (manager) 2017-03-03 16:10	In addition to x="$ $ $" that kre raises, there is also: grep -E "$wordend([[:space:]]\|$)" to consider.

(0003595) shware_systems (reporter) 2017-03-03 16:41	Re: "$ expansion is performed inside "" strings, what is the expected behaviour of expressions like x="$ $ $" and where is that defined?" is part of XCU 2.3 too. Section 5. applies when the sequences match what is expected to initiate an expansion subcontext, and "$ $ $" does not match so is treated as literal text by applying Section 4. If this was the default for PS1, it gets stored as if it was outside double quotes, due to quote removal, so evaluating it as that makes the individual '$' signs tokens of their own as literal text, which is how I described it. The ambiguity remaining, in terms of tokenizing the value, is that <blank> consolidation of the PS1 value is expected to leave at least one leading or trailing <blank> if these are part of the value and not trimmed off entirely, to match the behavior of existing implementations, or <blank> consolidation is not supposed to occur at all. The tentative resolution to this is requiring the expansions be evaluated as if they are enclosed inside double quotes, so consolidation doesn't happen.

(0003597) chet_ramey (reporter) 2017-03-03 16:46	The question in note 3590, whether bash is permitted to make the default $PS1 something other than "$ ", doesn't seem to have much, if anything, to do with the "undefined behavior" under discussion. However, I think it would be worthwhile for Posix to tweak the wording to specify that a string ending in "$ " is permissible. That would still capture existing behavior of all shells.

(0003598) Don Cragun (manager) 2017-03-03 17:15 edited on: 2017-03-03 17:16	I would prefer to have the POSIX shell for a user running with "normal" privileges to behave as if PS1 had been defined with PS1='\$ ' unless the startup environment defined it to be something else or a shell startup script defined it to be something else. Just saying that the prompt ends with "$ " (as suggested in Note: 0003597) would allow a shell to set the default prompt to a 200 character string (and I definitely don't want that).

(0003599) chet_ramey (reporter) 2017-03-03 18:58	re: note 3598 A 200-character string is highly unlikely, but what would it matter if a shell did that?

(0003600) eblake (manager) 2017-03-03 19:03	I can make bash's prompt as long as I want (although the pain is self-inflicted, since the basename of /bin/sh is only length 2): $ ln -s /bin/bash really_loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_name $ env -u PS1 ./really_loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_name --posix --norc really_loooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooong_name-4.3$

(0003601) chet_ramey (reporter) 2017-03-03 19:54	re: 3600 OK. What does it matter? What interest does the standard have in keeping the primary prompt short?

(0003602) shware_systems (reporter) 2017-03-03 19:57 edited on: 2017-03-03 20:00	On terminals that do not line wrap, echoing of user input on the same line would frequently go to /dev/nul/ instead of at least the first part being visible to the operator, with lines that long. A similar possibility exists with bash's PS4 prompts, if some script uses 100s of trace levels for a recursive algorithm. This is the primary concern here. Scripts that format output to adapt to terminals like this need a fixed width prompt to subtract from the terminal's line width setting to maintain alignments. The standard can't guard against any data ever being lost with terminals like this, but I agree with Don, the implementations shouldn't make it more likely to occur as their default behaviors. These variable length prompts are fine as a bash extension, or as default behavior for a user with extended privileges who has rtfm and is aware bash does this, but are non-conforming to what non-privileged users of sh who may only read the sh man page are told to expect of script and interactive behavior.

(0003604) kre (reporter) 2017-03-03 23:50 edited on: 2017-03-04 00:37	Can we all please move the discussion of what the default prompt (PS1) should be to some other issue (with one exception I will note in a second) - that is really not relevant here... The one exception is Don's observation (note 3598) that it could be defined to be '\$ '. Doing that would certainly solve the issue in the original report here - but it would not solve the real underlying problem, and it prevents people from explicitly doing PS1='$ ' which has been allowed, in all derived shells I have ever seen, since the original Bourne shell (1977?) It also does nothing to deal with strings that contain "$ " which has also worked forever. WRT note 3595: I believe you are mistaken, when performing parameter expansion of a quoted string, the string is not tokenised first (section 2.3 is not applied). If it were then "$ $ $ $" (that should appear with 1, then 3, then 5 embedded spaces, but does not...) would be identical to "$ $ $ $" as runs of spaces are eliminated by the tokenising procedure, leaving just token delimiters (which then get converted back to a space if required.) I'm not sure why you want to go to so much trouble to attempt to get out of simply defining that white space (including \n) or end of word, after a $ should mean the $ is to be treated literally - which is what everyone does. WRT Geoff's issue in note 3594, I am actually not sure I would attempt to fix the case: "$wordend([[:space:]]\|$)" - it might be better to leave it unspecified (though one could also make it explicit that the two string pairs '$)' and '$\|' are to be treated literally, and not be undefined, for the benefit of regular-expression entry - that is, one could add a sixth bullet point to 2.6 listing the characters which, when encountered after a '$', cause the '$' to be treated literally, rather than be unspecified. I have not done the testing really required, but I suspect that in all shells, the "unspecified behaviour" that section 2.6 states for $ not followed by a character one of the 5 defined groups, is always to treat the $ literally, and just proceed with the next character. However, I would be hesitant to alter that in general, despite it being more or less universal, as it would then prohibit extensions like $'...' as it would be required that "$'" expand to the two characters '$' and "'". That is actually what the NetBSD sh (which does not, yet anyway, implement the proposed $'...' syntax) does - if the results were not unspecified, then users would be able to demand that $' remain a literal 2 character string. As it is, however, anyone using that is entering the world of unspecified behaviour, and anything is allowed - which permits shells to go ahead and implement $'...' - which then allows this group to standardise that if it seems to have become popular enough. To get a literal "$'" pair, an application needs to write it as "\$'" It could be considered that to avoid to many restrictions, Geoff's example should really be written grep -E "$wordend([[:space:]]\|\$)" or perhaps grep -E "$wordend"'([[:space:]]\|$)' despite the inconvenience that would cause (and probably the number of scripts it would break if required, as far as strict conformance goes.) I don't think that is a reasonable requirement for "$ " though, precisely because of the default setting of PS1 - both as it is defined in the standard now, and how it has always been. Shells simply cannot define the sequence '$ ' as being anything other than the 2 characters written because of that, so I don't think simply avoiding the whole issue by changing PS1's default to be '\$ ' is the right way to make this issue go away. kre

(0003605) shware_systems (reporter) 2017-03-04 05:19	You're right, inside "" strings $ is not tokenized first. What I meant is a $ needs to be recognized, not tokenized, as possibly starting an expansion, by Rule 5, and if doesn't then Rule 4 continues to apply to find the closing double-quote, not blank compaction which is in another rule. That applies after the closing quote is found, not before, of the ASSIGNMENT_WORD token as a whole being recognized. When that is then evaluated fully quote removal occurs, so the value stored will have the extra spaces, but not the double quotes. If the stored value of PS1 is evaluated same as if it was a line of script, for output, no quotes are present so the extra blanks get deleted. If it is evaluated as if it was surrounded by double quotes again, the extra blanks stay. Phrased as "the value of this variable shall be subjected to parameter expansion" more implies the first evaluation type, yet the usual behavior observed is more in line with the second so we're making that explicit. Other ways of describing the observed behavior are possible too, so this is still tentative, but are more complex.

(0003606) kre (reporter) 2017-03-04 07:08	Just forget about section 2.3 and the rules there, they're for token recognition and have nothing whatever to do with parameter expansion. Parameter expansion (and other expansions, but they're not relevant here) is done in many contexts (I mean for many reasons, I'm not talking of execution contexts or whatever). In each case you simply have a string of characters (in the case of the default PS1 that string contains 2 characters, a dollar, and a space). For parameter expansion, what you do is find the next unquoted $, - which is the first of the two characters here - then look at the character that comes next, to decide what kind of parameter expansion is required (as specified in the section 2.6 preamble). Then follow the rules of 2.6.x (whichever sub-section applies) to actually perform the expansion. The problem here is that the next character is a space, which 2.6 says gives unspecified behaviour - applications are not supposed to do that - but here, it is not the application that did, it is the std, which is the problem. All we need to do to fix it is specify it. It really is simple. (It could also be fixed by quoting the $ in PS1, but that's an inferior solution.) kre

(0003607) kre (reporter) 2017-03-04 07:34 edited on: 2017-03-04 07:37	In addition if you still believe that the rules in 2.3 are somehow relevant when performing word expansions (on text that has already been tokenised, or never will be) consider the (not specified by the standard, but everyone does it) case of PS1='# ' and imagine what would happen to that when rule 9 of section 2.3 is applied... I have no idea why section 2.6 when giving the order of the expansions, when many are to apply, refers back to rule 5 of section 2.3 in case 1 - except that I guess it says how to find the full set of chars that make up the expansion (how to match the closing ')' in a '$(' command expansion) - but there obviously cannot be any intent that section 2.3 be used in general (that would simply be wrong.) Section 2.6 as a whole really needs major work - it is written as if the only time expansions get done is when processing the args of a command line. There ought to be one section for that, and then separately, sections for each of the types of expansions which specify how they are done, when they apply (which includes PSn processing, here doc processing, and more I expect, which have nothing whatever to do with command lines.) But all that work is more than is needed right now. kre

(0003613) kre (reporter) 2017-03-08 22:49	One more thought on this issue ... it might also be worth specifying what happens when a '$' is followed by a defined character (one which would cause an expansion normally) when evaluated in a context where that expansion is not performed. So, for example, if a shell does not do command substitution when expanding PS1 (which is acceptable with the proposed resolution of issue 1006 I believe) then how is the setting PS1='$(date) ... ' to be interpreted? Note that the new text from 1006 says that whether or not the command expansion happens is unspecified, it does not say anything about the effect on the prompt if the command expansion does not happen. If there are other contexts in which only some of the word expansions take place, the same issue would arise. kre

(0003616) Don Cragun (manager) 2017-03-16 15:57 edited on: 2018-10-11 15:11	This note was updated during the 2018-10-11 conference call to accept the modification suggested in Note: 0003629. Interpretation response ------------------------ The standard states that "$ " is the default PS1 prompt value and states that the behavior with this default prompt is unspecfied. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- This is not the way traditional and current shells behave and does not accurately describe the way shells should behave. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 2353 line 75012-75019 (2016 edition page and line numbers) XCU section 2.6 (Word Expansions) change: If an unquoted '$' is followed by a character that is not one of the following: [...] the result is unspecified. to: If a '$' that is neither within single-quotes nor escaped by a <backslash> is immediately followed by a character that is not a <space>, not a <tab>, not a <newline>, and is not one of the following: [...] the result is unspecified. If a '$' that is neither within single-quotes nor escaped by a <backslash> is immediately followed by a <space>, <tab>, or a <newline>, or is not followed by any character, the '$' shall be treated as a literal character.

(0003629) wpollock (reporter) 2017-03-17 11:46 edited on: 2017-03-17 11:47	The proposed wording in note 3616 doesn't address what happens if a dollar-sign is the last character (and thus not followed by anything). I think that should cause the dollar-sign to be taken as a literal. For example: $ sh -c 'echo $' (I think this could also occur with "here strings", if that is adopted in the future.) Robert ("kre") suggested this modification to the proposed wording of note 3616 on the mailing list: <quote> If a '$' that is neither within single-quotes nor escaped by a <backslash> is immediately followed by a character that is not a <space>, not a <tab>, not a <newline>, and is not one of the following: [...] the result is unspecified. If a '$' that is neither within single-quotes nor escaped by a <backslash> is immediately followed by a <space>, <tab>, or a <newline>, or is not followed by any character, the '$' shall be treated as a literal character. </quote>

(0004146) ajosey (manager) 2018-10-11 15:18	Interpretation Proposed: 11 Oct 2018

(0004166) ajosey (manager) 2018-11-12 19:48	Interpretation approved: 12 November 2018

Issue History
Date Modified	Username	Field	Change
2016-03-24 18:24	kre	New Issue
2016-03-24 18:24	kre	Name	=> Robert Elz
2016-03-24 18:24	kre	Section	=> 2.5.3, 2.6
2016-03-24 18:24	kre	Page Number	=> unknown
2016-03-24 18:24	kre	Line Number	=> unknown
2016-08-11 15:29	Don Cragun	Relationship added	related to 0001006
2017-03-02 16:22	nick	Page Number	unknown => 2352
2017-03-02 16:22	nick	Line Number	unknown => 74952
2017-03-02 16:22	nick	Interp Status	=> ---
2017-03-03 14:42	shware_systems	Note Added: 0003589
2017-03-03 15:27	eblake	Note Added: 0003590
2017-03-03 15:50	kre	Note Added: 0003591
2017-03-03 15:59	joerg	Note Added: 0003593
2017-03-03 16:10	geoffclare	Note Added: 0003594
2017-03-03 16:41	shware_systems	Note Added: 0003595
2017-03-03 16:44	joerg	Note Deleted: 0003593
2017-03-03 16:46	chet_ramey	Note Added: 0003597
2017-03-03 17:15	Don Cragun	Note Added: 0003598
2017-03-03 17:16	Don Cragun	Note Edited: 0003598
2017-03-03 18:58	chet_ramey	Note Added: 0003599
2017-03-03 19:03	eblake	Note Added: 0003600
2017-03-03 19:54	chet_ramey	Note Added: 0003601
2017-03-03 19:57	shware_systems	Note Added: 0003602
2017-03-03 20:00	shware_systems	Note Edited: 0003602
2017-03-03 23:50	kre	Note Added: 0003604
2017-03-03 23:52	kre	Note Edited: 0003604
2017-03-03 23:55	kre	Note Edited: 0003604
2017-03-04 00:37	kre	Note Edited: 0003604
2017-03-04 05:19	shware_systems	Note Added: 0003605
2017-03-04 07:08	kre	Note Added: 0003606
2017-03-04 07:34	kre	Note Added: 0003607
2017-03-04 07:36	kre	Note Edited: 0003607
2017-03-04 07:37	kre	Note Edited: 0003607
2017-03-08 22:49	kre	Note Added: 0003613
2017-03-16 15:57	Don Cragun	Note Added: 0003616
2017-03-16 15:58	Don Cragun	Interp Status	--- => Pending
2017-03-16 15:58	Don Cragun	Final Accepted Text	=> See Note: 0003616
2017-03-16 15:58	Don Cragun	Status	New => Interpretation Required
2017-03-16 15:58	Don Cragun	Resolution	Open => Accepted As Marked
2017-03-16 15:58	Don Cragun	Tag Attached: tc3-2008
2017-03-17 11:46	wpollock	Note Added: 0003629
2017-03-17 11:47	wpollock	Note Edited: 0003629
2018-10-11 15:11	Don Cragun	Note Edited: 0003616
2018-10-11 15:18	ajosey	Interp Status	Pending => Proposed
2018-10-11 15:18	ajosey	Note Added: 0004146
2018-11-12 19:48	ajosey	Interp Status	Proposed => Approved
2018-11-12 19:48	ajosey	Note Added: 0004166
2019-10-23 10:14	geoffclare	Status	Interpretation Required => Applied
2024-06-11 08:57	agadmin	Status	Applied => Closed

Aardvark Mark IV