Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001055 [1003.1(2013)/Issue7+TC1] Shell and Utilities Objection Omission 2016-06-02 16:49 2024-06-11 08:56
Reporter rhansen View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Richard Hansen
Organization
User Reference
Section 2.3 Token Recognition
Page Number 2321-2322
Line Number 73636-73689
Interp Status ---
Final Accepted Text Note: 0004247
Summary 0001055: unspecified how much is parsed before execution begins
Description POSIX does not say how much of the input (or eval text, command substitution body, dot script, or ENV script) is parsed before execution begins. This matters because
  • it affects how much code is executed before a syntax error causes the shell to exit (note that it is common to create a self-extracting archive by prepending some shell code to a tarball; the expectation is that the shell executes the extraction code before it would try and inevitably fail to parse the tarball), and
  • there is an intimate relationship between parsing and alias substitution.
Desired Action Specific wording to be provided later, but a summary of the desired changes (assuming implementations behave this way):
  • Input and ENV scripts shall be parsed using program as the start symbol.
  • eval bodies, command substitution bodies, and dot scripts shall be parsed using compound_list as the start symbol.
  • When code is parsed as a program symbol: Once a complete_command has been parsed, the shell shall execute the complete_command before it starts parsing the next complete_command.
  • When code is parsed as a compound_list symbol: The compound_list shall be fully parsed before any execution of that compound_list begins.

Tags tc3-2008
Attached Files

- Relationships
related to 0000953Closedajosey Alias expansion is under-specified 
related to 0001048Closed deprecate alias and unalias 

-  Notes
(0003270)
rhansen (manager)
2016-06-23 16:26
edited on: 2016-06-23 16:44

On page 2322 after line 73689 (just before XCU 2.3.1 Alias Substitution), insert a new paragraph:
Once a complete_command symbol has been recognized by the grammar (see [xref to 2.10 Shell Grammar]), the complete_command shall be subjected to alias substitution (see [xref to 2.3.1 Alias Substitution]) then executed before the next complete_command is tokenized and parsed.

On page 2322 lines 73691-73693 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.
to:
After a sequeunce of tokens has been parsed and recognized as a command or compound list by the grammar (see [xref to 2.10 Shell Grammar]), but before the command or compound list is executed, each word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.

On page 2331 lines 74074-74076 (XCU 2.6.3 Command Substitution), change:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
to:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command.

With both the backquoted and $(command) forms, command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed. Any valid compound_list can be used for command, except a compound_list consisting solely of redirections which produces unspecified results.

On page 2325 lines 73782-73785 (XCU 2.5.3 Shell Variables, ENV) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the contents of the file shall be tokenized, parsed, subjected to alias expansion, and executed as described in [xref to 2.3 Token Recognition]. The contents shall be executed in the current environment.

On page 2364 line 75304 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The contents of file shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2366 line 75371 (XCU 2.14 eval), change:
The constructed command shall be read and executed by the shell.
to:
The constructed command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2350 after line 74800 (XCU 2.10.2 Shell Grammar Rules), insert the following comment above %start:
/* The start symbol is compound_list when parsing dot scripts, command
   substitution bodies, and the arguments passed to eval */

On page 3678 after line 125707 (just before XRAT C.2.3.1), insert a new paragraph:
Because a complete_command is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed, deferring the discovery of syntax errors has several benefits:
  • It makes it possible for script authors to test for the avilability of a nonstandard extension and react appropriately before the use of the extension would trigger a syntax error.
  • It makes it possible to create self-extracting tarballs (a shell script concatenated with a payload archive that extracts the archive when executed).
  • The shell does not have to read and parse the complete script before execution, which reduces memory usage when executing extremely long scripts.

On page 3681 lines 125831-125832 (XRAT C.2.5.3 ENV) change:
However, unlike dot scripts, no PATH searching is performed. This is used as a guard against Trojan Horse security breaches.
to:
However, unlike dot scripts, ENV scripts are parsed as a program, not a compound_list. This distinction matters because it influences when aliases take effect and whether syntax errors in the script are discovered before any part of the script is executed.

For security reasons, PATH is not searched when locating the ENV script.


(0003276)
geoffclare (manager)
2016-06-29 16:06

I'm not too keen on the proposed change to 2.6.3 Command Substitution.

The existing text saying "any valid shell script" highlights the main difference between $(...) and `...`. That difference is lexical in nature; i.e. you can copy and paste a shell script into $(...) and it works. You can't do that with `...` because of quoting. By changing to "any valid compound_list" and treating $(...) and `...` the same, this lexical difference is ignored.
(0003291)
rhansen (manager)
2016-07-07 15:50

Good catch, Geoff! I'll post a new revision.
(0003292)
rhansen (manager)
2016-07-07 15:52
edited on: 2016-07-07 15:53

On page 2322 after line 73689 (just before XCU 2.3.1 Alias Substitution), insert a new paragraph:
Once a complete_command symbol has been recognized by the grammar (see [xref to 2.10 Shell Grammar]), the complete_command shall be subjected to alias substitution (see [xref to 2.3.1 Alias Substitution]) then executed before the next complete_command is tokenized and parsed.

On page 2322 lines 73691-73693 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.
to:
After a sequeunce of tokens has been parsed and recognized as a command or compound list by the grammar (see [xref to 2.10 Shell Grammar]), but before the command or compound list is executed, each word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.

On page 2331 line 74073 (XCU 2.6.3 Command Substitution), add a new sentence at the end of the paragraph at lines 74067-74073:
After backslashes have been processed, the characters in command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed.

On page 2331 line 74076 (XCU 2.6.3 Command Substitution), add a new sentence at the end of the paragraph at lines 74074-74076:
The characters in command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed.

On page 2325 lines 73782-73785 (XCU 2.5.3 Shell Variables, ENV) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the contents of the file shall be tokenized, parsed, subjected to alias expansion, and executed as described in [xref to 2.3 Token Recognition]. The contents shall be executed in the current environment.

On page 2364 line 75304 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The contents of file shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2366 line 75371 (XCU 2.14 eval), change:
The constructed command shall be read and executed by the shell.
to:
The constructed command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2350 after line 74800 (XCU 2.10.2 Shell Grammar Rules), insert the following comment above %start:
/* The start symbol is compound_list when parsing dot scripts, command
   substitution bodies, and the arguments passed to eval */

On page 3678 after line 125707 (just before XRAT C.2.3.1), insert a new paragraph:
Because a complete_command is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed, deferring the discovery of syntax errors has several benefits:
  • It makes it possible for script authors to test for the avilability of a nonstandard extension and react appropriately before the use of the extension would trigger a syntax error.
  • It makes it possible to create self-extracting tarballs (a shell script concatenated with a payload archive that extracts the archive when executed).
  • The shell does not have to read and parse the complete script before execution, which reduces memory usage when executing extremely long scripts.

On page 3681 lines 125831-125832 (XRAT C.2.5.3 ENV) change:
However, unlike dot scripts, no PATH searching is performed. This is used as a guard against Trojan Horse security breaches.
to:
However, unlike dot scripts, ENV scripts are parsed as a program, not a compound_list. This distinction matters because it influences when aliases take effect and whether syntax errors in the script are discovered before any part of the script is executed.

For security reasons, PATH is not searched when locating the ENV script.


(0004194)
geoffclare (manager)
2019-01-02 15:30

Reopening because the resolution includes a change to XCU 2.3.1 that overlaps with the one proposed in Note: 0003113 of bug 0000953 (which was reopened after that proposal).
(0004247)
geoffclare (manager)
2019-02-11 16:58

On (2016 edition) page 2348 after line 74792 (just before XCU 2.3.1 Alias Substitution), insert a new paragraph:
In situations where the shell parses its input as a program, once a complete_command has been recognized by the grammar (see [xref to 2.10 Shell Grammar]), the complete_command shall be executed before the next complete_command is tokenized and parsed.

After (2016 edition) page 2412 line 77241 (set Application Usage), add a new paragraph:
Use of <tt>set -n</tt> causes the shell to parse the rest of the script without executing any commands, meaning that <tt>set +n</tt> cannot be used to undo the effect. Syntax checking is more commonly done via <tt>sh -n script_name</tt>.

After (2016 edition) page 3239 line 108855 (sh utility Application Usage), add a new paragraph:
<tt>sh -n</tt> can be used to check for many syntax errors without waiting for complete_commands to be executed, but may be fooled into declaring false positives or missing actual errors that would occur when the shell actually evaluates eval commands present in the script, or if there are alias (or unalias) commands in the script that would alter the syntax of commands that use the affected aliases.

On (2016 edition) page 3720 after line 127520 (just before XRAT C.2.3.1), insert a new paragraph:
Because a complete_command encountered during a program is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed (possible with <tt>sh -n</tt>), deferring the discovery of syntax errors has several benefits:
  • It makes it possible for script authors to test for the availability of a nonstandard extension and react appropriately before the use of the extension would trigger a syntax error.

  • It makes it possible to create self-extracting tarballs (a shell script concatenated with a payload archive that extracts the archive when executed).

  • The shell does not have to read and parse the complete script before execution, which reduces memory usage when executing extremely long scripts.

- Issue History
Date Modified Username Field Change
2016-06-02 16:49 rhansen New Issue
2016-06-02 16:49 rhansen Name => Richard Hansen
2016-06-02 16:49 rhansen Section => 2.3 Token Recognition
2016-06-02 16:49 rhansen Page Number => 2321-2322
2016-06-02 16:49 rhansen Line Number => 73636-73689
2016-06-02 16:49 rhansen Interp Status => ---
2016-06-02 16:53 rhansen Relationship added related to 0000953
2016-06-23 16:26 rhansen Note Added: 0003270
2016-06-23 16:28 rhansen Note Edited: 0003270
2016-06-23 16:32 rhansen Note Edited: 0003270
2016-06-23 16:44 rhansen Note Edited: 0003270
2016-06-29 16:06 geoffclare Note Added: 0003276
2016-07-07 15:50 rhansen Note Added: 0003291
2016-07-07 15:52 rhansen Note Added: 0003292
2016-07-07 15:53 rhansen Note Edited: 0003292
2017-07-06 16:00 Don Cragun Tag Attached: issue8
2017-07-06 16:00 geoffclare Final Accepted Text => Note: 0003292
2017-07-06 16:00 geoffclare Status New => Resolved
2017-07-06 16:00 geoffclare Resolution Open => Accepted As Marked
2019-01-02 15:30 geoffclare Note Added: 0004194
2019-01-02 15:30 geoffclare Status Resolved => Under Review
2019-01-02 15:30 geoffclare Resolution Accepted As Marked => Reopened
2019-02-11 16:58 geoffclare Note Added: 0004247
2019-02-11 16:59 geoffclare Final Accepted Text Note: 0003292 => Note: 0004247
2019-02-11 16:59 geoffclare Status Under Review => Resolved
2019-02-11 16:59 geoffclare Resolution Reopened => Accepted As Marked
2019-02-11 17:00 geoffclare Tag Detached: issue8
2019-02-11 17:00 geoffclare Tag Attached: tc3-2008
2019-02-14 16:21 eblake Relationship added related to 0001048
2019-10-23 14:36 geoffclare Status Resolved => Applied
2024-06-11 08:56 agadmin Status Applied => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker