Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001226 [1003.1(2016)/Issue7+TC2] Shell and Utilities Objection Error 2019-01-24 19:00 2019-10-07 15:17
Reporter shware_systems View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Interpretation Required  
Name Mark Ziegast
Organization SHware Systems Dev.
User Reference
Section XCU 2.9.1
Page Number 2368
Line Number 75592
Interp Status Proposed
Final Accepted Text See Note: 0004394
Summary 0001226: shell can not test if a file is text
Description With the sentence "If the executable file is not a text file, the shell may
bypass this command execution.", POSIX makes no distinction between binary formatted files and files formatted as text according to some locale, so there is no standard way to determine what the contents of an arbitrary file represent. As such, after an exec() fails the shell just knows the file had an appropriate x-permission bit set but isn't in the format exec() expected. It is up to the invoked shell to determine this is not text by eventually getting some syntax or grammar error when reading the file. That platform's exec() may just recognize ELF binaries, and the file may be a COFF or OMF binary for access by other platforms over a network, as example, and not a script.

What can be tested is the type of a file, and it's more precise to say command execution can always be bypassed if the file's type precludes the possibility of it being a source of scripts, such as a directory or symbolic link, and a platform may elect to treat types other than regular files as binary-only oriented and skip trying to process these also. Regular files are the only type the standard requires as able to persist text data, and that sentence should reflect this.
Desired Action Change:
  If the executable file is not a text file,

to:
  If the executable file is not a regular file,
Tags tc3-2008
Attached Files

- Relationships
related to 0001161Resolved command -v must find something executable 
related to 0001250Resolved sh input file restrictions are too strict 

-  Notes
(0004221)
eblake (manager)
2019-01-24 20:15
edited on: 2019-01-24 20:17

A shell may not be able to prove whether a file is a text file, but can easily filter out a number of non-text files with a short read() or two (any file that contains a NUL byte, any file that does not end in a newline, any file that does not contain a newline within LINE_MAX bytes) regardless of locale, as well as the converse (any file that starts with #! might be treated as a text file rather than a binary to pass to exec(), even though POSIX itself states that the use of #! is non-portable). You are also correct that a file can be a text file in one locale but a non-text file in another locale, based on whether byte sequences cause encoding errors in one locale but not the other.

Given that existing shells also have various other heuristics (such as any iscntrl() hits in the first 512 bytes) for deciding whether a regular file is unlikely to be executable as a shell script (although it does not necessarily make the file a non-text file). I'm not sure if the desired action is sufficient to permit what existing shells do.

(0004222)
shware_systems (reporter)
2019-01-24 21:39

Yes, but that easy filtering still the province of the invoked shell, as things are written, possibly to return 126 as exit code. This applies when the file exists but doesn't have the x bit set too. These will cause exec() to fail, but the shell is still expected to try and process it as a script. As far as the logical model goes, how I read it, what heuristics may be implemented are an adjunct to, not replacement for, the line-by-line evaluations of XCU 2.3 once the invoked shell finishes initializing.

What this change does is make it so regular files are always included for consideration, but with the "may bypass" wording allows other file types to be included too. While it would usually be silly for a platform to process the data representing a directory as a script, it shouldn't be impossible to attempt it if the internal format can be considered text also, similar to tar headers and the expectations for ar archives. If the underlying file system does say 'no, the inode fields are always binary encoded', big or little endian, then I'd expect the shell to bypass trying to process it because that's defined as being non-text for the file type.
(0004227)
kre (reporter)
2019-01-25 01:35

I am not sure of the point of this discussion.

This relates to the proposed resolution of 1161 I believe?

In that, the shell is already required to find the file it
would exec (somehow) - whatever rules it follows in making
that decision are appropriate here do, and the description
of the -v option to the command command does not need to list
lots of special cases to make this be OK.

The problem with the resolution to 1161 is recorded in
note 4223 attached to that bug report.
(0004228)
kre (reporter)
2019-01-25 01:57

If this issue really is simply about what 2.9.1 (really
2.9.1.1.1.e.i.b)says then I agree with Ed, nothing more
is needed than what is there "not a text file" covers all
that is needed - it certainly includes devices, directories,
etc - and also allows the shell to avoid attempting to
interpret whatever files it considers to not possibly be
script files.

That's a good thing - we do not need to be specific about
just which files it should attempt to run, and which it
should not (especially these days as the universal use of
#! means that any script can make itself guaranteed
executable, and so the shell can be much more conservative
about what files it attempts to run as a script when the
exec fails.)
(0004384)
geoffclare (manager)
2019-05-02 14:41

A problem with the current wording that doesn't seem to have been mentioned yet is that it allows the shell not to execute the file as a shell script if it contains lines longer than LINE_MAX characters. However, the sh page says under INPUT FILES: "The input file shall be a text file, except that line lengths shall be unlimited."

Therefore as a minimum we should change "is not a text file" to "contains any NUL characters or any byte sequences that do not form valid characters", or maybe to "does not meet the requirements for a sh input file stated in the INPUT FILES section in [xref to sh]" in order to avoid duplication.
(0004393)
eblake (manager)
2019-05-13 16:00

There are "scripts" in the wild that are self-extracting - the first half of the file used in isolation as a text file, and ends with 'exit' or similar to prevent the shell from even parsing the second half; then the second half contains a binary payload that is processed by the first half. Do we want the standard to permit such files as a valid shell script, or are they non-portable because of the non-text nature of the second half of the file, even though the shell does not reach that part of the file to parse it?
(0004394)
nick (manager)
2019-05-16 16:40
edited on: 2019-05-16 16:41

Interpretation response
------------------------
The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
The shell Input File definition states "The input file shall be a text file, except that line lengths shall be unlimited." This conflicts with the requirements stated here. Some current implementations do not make any check here, others use a simple heuristic to determine if the file may be a script.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

At page 2368 line 75615, replace:
        
If the executable file is not a text file, the shell may bypass this command execution.

with
        
The shell may apply a heuristic check to determine if the file to be executed could be a script and may bypass this command execution if it determines that the file cannot be a script. In this case, it shall write an error message, and shall return an exit status of 126.
        <small>Note: A common heuristic for rejecting files that cannot be a script is locating a NUL byte prior to a <newline> byte within a fixed-length prefix of the file. Since sh is required to accept input files with unlimited line lengths, the heuristic check cannot be based on line length.</small>


(0004395)
eblake (manager)
2019-05-16 19:28

The issue of whether the shell should permit non-text files as input has been split into the separate 0001250
(0004608)
agadmin (administrator)
2019-10-07 15:17

Interpretation proposed: 7 October 2019

- Issue History
Date Modified Username Field Change
2019-01-24 19:00 shware_systems New Issue
2019-01-24 19:00 shware_systems Name => Mark Ziegast
2019-01-24 19:00 shware_systems Organization => SHware Systems Dev.
2019-01-24 19:00 shware_systems Section => XCU 2.9.1
2019-01-24 19:00 shware_systems Page Number => 2368
2019-01-24 19:00 shware_systems Line Number => 75592
2019-01-24 20:15 eblake Note Added: 0004221
2019-01-24 20:17 eblake Note Edited: 0004221
2019-01-24 21:39 shware_systems Note Added: 0004222
2019-01-25 01:35 kre Note Added: 0004227
2019-01-25 01:57 kre Note Added: 0004228
2019-01-25 02:46 eblake Relationship added related to 0001161
2019-05-02 14:41 geoffclare Note Added: 0004384
2019-05-13 16:00 eblake Note Added: 0004393
2019-05-16 16:40 nick Note Added: 0004394
2019-05-16 16:40 nick Note Edited: 0004394
2019-05-16 16:41 nick Note Edited: 0004394
2019-05-16 16:42 nick Interp Status => Pending
2019-05-16 16:42 nick Final Accepted Text => See Note: 0004394
2019-05-16 16:42 nick Status New => Interpretation Required
2019-05-16 16:42 nick Resolution Open => Accepted As Marked
2019-05-16 16:42 nick Tag Attached: tc3-2008
2019-05-16 19:23 eblake Relationship added related to 0001250
2019-05-16 19:28 eblake Note Added: 0004395
2019-10-07 15:17 agadmin Interp Status Pending => Proposed
2019-10-07 15:17 agadmin Note Added: 0004608


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker