0001250: sh input file restrictions are too strict

Notes
(0004396) kre (reporter) 2019-05-16 22:46	I fully agree with deleting the trailing newline requirement (and so not requiring a file that is not a "text file" from that point of view). Shells already deal with "scripts' without newlines as input to "sh -c" and eval all the time - no-one would ever propose requiring a trailing newline on those, and if the shell can handle reaching EOF as correctly terminating the script in those cases, it can for files as well. Or alternatively, if "sh file" works, then (provided the file is not so big as to exceed the arg lenght limit) so should sh -c "$(cat file)" And if the latter works, so should the former. For the latter it is clearly immaterial whether the file ends in a \n as the $() would remove it. For \0 chars (or bytes) my impression is that most shell simply ignore them (the way the byte was always meant to be used - it appears on paper tape input where the tape has been scroller forward to check its contents, and then repositioned back - but not to the exact next position, leaving a few \0 bytes between the last valid char and this one). However, I also agree that we should not require that behaviour, and that scripts should not contain nul chars if they expect to be processed correctly. The "the initial portion of the file intended to be parsed according to..." is a bit vague however. Some shells have been known to test the first 512 bytes of the script for anything that looks "binary" (whatever their interpretation of that is). That is, a script that is (entirely) exec gzip -d <gzip'd data starts here> is not necessarily going to work. Further you cannot really expect the shell will stop parsing at the exact point where it will stop executing, even where that stop is not conditional. That is, exit ; <<>> foo (any binary gibberish) is not going to work, as shells parse (at least) the complete line before executing any of it. There is (or perhaps should be) a difference between what is expected to work when run as "sh file" (one might argue there that the shell should not test the input at all, but simply execute it) and what is expected to be run for "./file" from inside the shell (where file has execute permission, but the exec fails with an ENOEXEC error). That latter case is probably the only one where there shell should be doing any analysis of the content of the file, so that attempts to run random commands that are intended for some other system architecture (ARM binaries in an x86 system for example) won't be attempted to run as a sh script (unless the user actually says it should be done). So for the "INPUT FILES" section I'd simply remove all restrictions on what the file can contain ... if it doesn't survive tokenisation and the grammar (whic, for example, a file with \0's in it would not, if the shell does not simply ignore them) then there will be errors and the scipt will fail. (Not part of the standard, but this also allows anything to be in a #!/bin/sh file, as those are run as "sh file" effectively - whatever their content). Whatever restrictions on input format we want to be able to test for would be better placed in XCU 2.9.1.1.1.e.i.b (page 2367, around line 75586-9). That's where it makes sense to verify the file format before arbitrarily deciding to run it as "sh file" when the user said just "file" and it was found via a PATH search (or the user gave a path name). [Aside: that section number assumes that the resolution of issue 1227 does not fix that nonsense numbering...] There is also a relationship here with the changes being make (not yet finished) for issue whatever it is (I have temporarily lost its number...) which is specifying just how much data the shell should parse before it starts executing any of it. In particular, for current purposes, whether the file argument to the "." (dot) special built-in is an "input file" in the sense being considered here. At the very least there probably needs to be some wording somewhere saying that it is not true that any file that can br run as "sh file" is suitable to be run as ". file" (assuming the same file is located in both cases - eg: for a fully qualified path name). (We know the reverse is not true, as the file for ". file" can contain a "return" command, which would be undefined if the script was run as "sh file"

(0004397) eblake (manager) 2019-05-17 00:00	"Further you cannot really expect the shell will stop parsing at the exact point where it will stop executing, even where that stop is not conditional." Actually, you can. Line 108346 states "When the shell is using standard input and it invokes a command that also uses standard input, the shell shall ensure that the standard input file pointer points directly after the command it has read when the command begins execution. It shall not read ahead in such a manner that any characters intended to be read by the invoked command are consumed by the shell (whether interpreted by the shell or not) or that characters that are not read by the invoked command are not seen by the shell."

(0004398) kre (reporter) 2019-05-17 02:05	Yes, I know about the requirement mentioned in Note: 0004397 but that does not really apply here, as the shell is not reading standard input, it is reading the file from where its commands originate. That requirement applies when one does "sh < file" or more commonly when the standard input is the terminal. But more than that, even there, shells read entire lines (or to EOF when there is no \n) before executing anything, always. That is, if you type (in an interactive shell, so the shell is reading standard input) read foo ; echo "$foo" and you can find a single shell which does not wait for the next line of input and assign that to "foo", but rather assigns the string 'echo "$foo"' to foo, I would be amazed. I have certainly never seen a shell which acts like that, and I doubt anyone else has either, even though that "shell is using standard input" and "it invokes a command that also uses standard input", the command after which it ensures "the standard input file pointer points" is to the start of the next line, as it has already read the 'echo "$foo"' part (tty input, for a normal shell, not doing something like command line editing, is delivered to the process a line at a time - shells included - and there is no way to seek backwards and get the input again.) It has always been this way, and shells reading files do things the same way - they "parse ahead" of the actual command being executed. Oh yes, now I remember, this is in the bug related to alias processing 0001055 which is specifying exactly how much the shell should or must parse ahead before it executes a command, as aliases affect parsing of commands after the changes made become visible, so we need to know just when a change made by an alias command becomes available to be used.

Issue History
Date Modified	Username	Field	Change
2019-05-16 19:22	eblake	New Issue
2019-05-16 19:22	eblake	Name	=> Eric Blake
2019-05-16 19:22	eblake	Organization	=> Red Hat
2019-05-16 19:22	eblake	User Reference	=> ebb.sh.input
2019-05-16 19:22	eblake	Section	=> XSH sh
2019-05-16 19:22	eblake	Page Number	=> 3228
2019-05-16 19:22	eblake	Line Number	=> 108358
2019-05-16 19:22	eblake	Interp Status	=> ---
2019-05-16 19:23	eblake	Relationship added	related to 0001226
2019-05-16 22:46	kre	Note Added: 0004396
2019-05-17 00:00	eblake	Note Added: 0004397
2019-05-17 02:05	kre	Note Added: 0004398
2019-06-27 16:23	Don Cragun	Status	New => Resolved
2019-06-27 16:23	Don Cragun	Resolution	Open => Accepted
2019-06-27 16:23	Don Cragun	Tag Attached: tc3-2008
2019-11-19 16:21	geoffclare	Status	Resolved => Applied

Aardvark Mark IV