0001043: Which newline starts collection of here document data?

Notes
(0003141) joerg (reporter) 2016-04-08 13:33	Some of your questions are easy to answer once you understand that a command substitution with $(..) or `..` always is a word or part of a word. Given that the shell needs to first collect all characters that form the word, it is obvious that "the next NEWLINE" must be seen locally first, in case of a here document that appears to be inside a command substitution.

(0003144) kre (reporter) 2016-04-09 11:42	Sorry, I have no idea what "must be seen locally first" means. The point here is that shells interpret these things in different ways. Perhaps there is something in the spec which makes it clear which is correct, but personally, I cannot see it. Perhaps it is obvious which should be correct - and maybe this is a case where what should be correct (rather than what is actually implemented) might be specified (since it is rather an outlier in the syntax) but if that is the case, I cannot come to a conclusion about what should be correct, and what should not. I know what the NetBSD shell does in these cases, and I have done some testing of other shells, but none of that has blessed me with magic enlightenment of correctness. ps: I do understand that command substitution is part of a word, but I cannot fathom how that helps - the actual here document, and the here document operator that creates it, are separated lexically in the input. What matters is just how that is to be resolved in some kind of consistent matter that is more or less in accordance with what works today.

(0003145) jilles (reporter) 2016-04-09 15:33	That a command substitution is always part of a WORD or similar token implies that any newlines part of the command substitution are not NEWLINE tokens on that level and do not start here-document contents. For example: cat - <<EOF $(find . ) message EOF is a valid command. A different situation is where the << redirection is within the command substitution and the here-document contents are outside of it. Historically, ash variants have used their implementation technique that fully parses command substitutions when encountered to allow things like: v=$(cat <<EOF) & EOF in addition to the standard v=$(cat <<EOF & EOF ) The ash-specific form violates the statement in XCU 2.6.3 Command Substitution that "all characters following the open parenthesis to the matching closing parenthesis constitute the command", since the here-document contents are outside the parentheses. More practically, the ash-specific form is hard to parse for implementations that only parse command substitutions to the minimal level necessary to find their end while parsing the outer command and only fully parse them just before execution. I think both implementation techniques (ash-style immediate full parse and bash/ksh93-style minimal immediate parse) should be valid. Changing from the latter to the former technique is likely break existing scripts that contain invalid command substitutions that are not executed. The same special form with `...` command substitution: v=`cat <<EOF` & EOF seems to have no historical basis.

(0003147) kre (reporter) 2016-04-10 02:27	Perhaps I erred by concentrating so much on command substitution in the original filing of this issue, it is just that that is where it first really came to my attention, so ... But this, from note 3145 That a command substitution is always part of a WORD or similar token implies that any newlines part of the command substitution are not NEWLINE tokens on that level gets right to the crux of the issue, and which led to the title of this bug report "which newline ..." That is, from where, in the standard, do you get the qualification "on that level". I do not see that anywhere. If we take that same example, and re-cast it slightly to: cat - <<EOF ; if find . fi message EOF then (ignoring the command args, and whether this is a sane way to write the command) is that a legal command sequence, or not (this time using "if" and "fi" as the bracketing operators rather than $( ) ). If this is correct, upon what basis is the newline after "." being ignored here? What if we made it a simple subshell instead .. cat - <<EOF ; ( find . ) message EOF Is that one correct? And if so, the same question. There is no "one word" or even "same level" argument to use here. And if those forms are not valid, how exactly do you explain to script writers how those (particularly the sub-shell version) are different from the command substitution example in a way they can comprehend. And while doing this also explain how (cat << EOF) \| cmd data EOF works in a consistent way (which I am assuming we agree is how it should work) Or is it required to be written (cat << EOF data EOF ) \| cmd ? And if that is required, where is that written? The spec just says that here doc data comes after the next newline (token) - and we are back to the topic of the bug report - "which newline (token)" ? And wrt: Historically, ash variants have used their implementation technique that fully parses command substitutions when encountered to allow things like: v=$(cat <<EOF) & EOF in addition to the standard v=$(cat <<EOF & EOF ) I have no problem with considering the second of those "standard", but I am by no means convinced that the first is not just as standard. I see nothing written currently that makes it so - maybe the ash technique is how those things should be parsed? Or maybe the doc is just deficient and needs fixing? Note: I have no particular axe to grind here, I am not advocating one result over another (which the wording I proposed adding, as poor and sloppy as it was, should, I hope, make clear.) What I would like to see happen is for some resolution to be reached so that this same discussion doesn't have to happen again sometime in the future, when perhaps there is actually something important riding on the outcome. Lastly, I agree that the form: v=`cat <<EOF` & EOF seems to have never been implemented (previously) anywhere. However I saw it used in an actual script (one I did not write - rather, one I got bug reports about when I made NetBSD's sh start to object to, rather than simply ignore, missing here document data - previously the script had been parsed without error, after my earlier change, it no longer was, and that was brought to my attention as a problem caused by my first change.) Now the script in question had other errors, it could never have actually worked as intended, so it is not really a good example to use, but when I thought about it, I could find nothing in the standard to forbid this (that the command actually embedded in the `...` did not do what its author intended was not material), if anything the "next newline (token)" wording seems to explicitly allow it. It turned out to be easy to "fix" (and looked to be something of an oversight caused by the way that `...` type command substitutions are parsed, that it had not worked all along) so I did. That handled the "bug report" ... the script in question still doesn't work, but that doesn't matter, it has no syntax error any more, so it parses "correctly" (even if differently than before) and the actual command sequence is, in practice, never used anyway. So, everyone was happy...

(0005563) geoffclare (manager) 2021-12-17 15:05	See bug 1036 Note: 0005561 for a proposed resolution.

Issue History
Date Modified	Username	Field	Change
2016-04-07 13:16	kre	New Issue
2016-04-07 13:16	kre	Name	=> Robert Elz
2016-04-07 13:16	kre	Section	=> 2.7.4
2016-04-07 13:16	kre	Page Number	=> 2335-2336
2016-04-07 13:16	kre	Line Number	=> 74235-74256
2016-04-08 13:33	joerg	Note Added: 0003141
2016-04-09 11:42	kre	Note Added: 0003144
2016-04-09 15:33	jilles	Note Added: 0003145
2016-04-10 02:27	kre	Note Added: 0003147
2017-03-23 16:11	nick	Relationship added	related to 0001036
2017-03-23 16:11	nick	Relationship added	related to 0001037
2021-09-10 08:47	geoffclare	Relationship added	related to 0001521
2021-12-17 15:05	geoffclare	Note Added: 0005563
2022-01-06 17:22	Don Cragun	Interp Status	=> ---
2022-01-06 17:22	Don Cragun	Status	New => Closed
2022-01-06 17:22	Don Cragun	Resolution	Open => Duplicate
2022-01-06 17:24	Don Cragun	Relationship replaced	duplicate of 0001036

Aardvark Mark IV