Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001100 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Clarification Requested 2016-10-27 12:40 2018-05-17 16:03
Reporter Mark_Galeck View Status public  
Assigned To
Priority normal Resolution Rejected  
Status Closed  
Name Mark Galeck
Organization
User Reference
Section 2.10 Shell Grammar
Page Number 2375-2381
Line Number 75873-76150
Interp Status ---
Final Accepted Text
Summary 0001100: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.
Description I recently made several reports concerning sections 2.10.1/2, and then I saw at least one more problem of the similar kind. If I continue making incremental reports, even if the changes were approved, they will result in a bigger and bigger mess.

Therefore I decided to cancel some previous reports, add new issues and make one summary report, which is a comprehensive rewrite of the whole Shell Grammar section, to fix the issues I find, as well as make the whole presentation more straightforward and less convoluted.


Here is the list of all the specific bugs this report addresses, including some previous reports. I am not listing changes here that morely improve the presentation; to see all the changes, you should probably use some "diff" program.



1. Previous reports 1096, 1094, 1097, 1099, 1095, 1092 are included here and can be cancelled.

2. Previous reports 1098, 1093, 1091, 1088 can be cancelled. Let's say we classify them as bogus, and those changes are not included here.


3. (new issue) In the current standard, cmd_word cannot be a reserved word. It is very convoluted, but if you carefully trace the application of various rules to each other, you will end up that in fact, cmd_name and cmd_word follow exactly the same semantics right now, both do not allow reserved words. Only cmd_name should not allow reserved words.


4. (new issue)

In multiple places in the current standard, rule 1 applies to WORD, and thus reserved words are not allowed, where all words should be allowed. Some of the reports above cover this. Additionally, we have:

WORD in the case_clause production - currently it cannot be a reserved word, but it should be allowed to be a reserved word.

Same for WORD in cmd_suffix production.

------------------------

This rewrite is intended only to include the changes mentioned above, and should otherwise be equivalent to the current standard.

I will be happy to answer any questions, provide clarifications, or fix if you find any bugs.

I do not have the time to discuss the merits of the changes. The maintainer of this standard is free to reject any part or all of this report, or to continue to rewrite my Section 2.10 in any way that suits them. I completely do not mind.

Yes the text I provide for the new Section 2.10 is just raw text format, it does not have hyperlinks and different fonts. Somebody else would have to do that.

Thank you!

Desired Action 2.10. Shell Grammar

The following grammar defines the Shell Command Language. This formal syntax shall take precedence over the preceding text syntax description.

The rules in Token Recognition delimit operator and word tokens.

In order to appear in the grammar as token identifiers, the tokens shall be classified according to the following rules, applied in the following order of precedence:


1. The token identifier for any operator, occurs when the token is that operator.


2. IO_NUMBER is if the string consists solely of digits and the delimiter character is one of '<' or '>'.


3. This rule only applies in function_body production; see below in the grammar.

Word expansion and assignment shall never occur, even when required by the rules below, when this production is being parsed. WORD is each token that might either be expanded or have assignment applied to it, consisting only of characters that are exactly described in Token Recognition.
 

4. The token identifier for any reserved word, occurs when the token is exactly that reserved word.

Note:
Because at this point <quotation-mark> characters are retained in the token, quoted strings cannot be recognized as reserved words. Also note that line joining is done before tokenization, as described in Escape Character (Backslash), so escaped <newline> characters are already removed at this point.


5. This rule only applies in simple_command and cmd_prefix productions; see below in the grammar.

For this rule, we define "important" <equal-sign> characters in a token: they are unquoted (as determined while applying rule 4 from Token Recognition), that are not part of an embedded parameter expansion, command substitution, or arithmetic expansion construct (as determined while applying rule 5 from Token Recognition), and do not begin the token.

For the definition of a valid "name", see XBD Name.

5a.
If the token does not contain important '=' and is not a reserved word, it is WORD.
If there are important '=' and all the characters preceding the first such '=' do not form a valid name, it is unspecified whether it is WORD.

5b.
If the token does not contain important '=', it is WORD.
If there are important '=' and all the characters preceding the first such '=' do not form a valid name, it is unspecified whether it is WORD.

5c.
If there are important '=' and all the characters preceding the first such '=' form a valid name, it is ASSIGNMENT_WORD.
If they do not form a valid name, it is unspecified whether it is ASSIGNMENT_WORD.

Assignment to the name within ASSIGNMENT_WORD token shall occur as specified in Simple Commands.


6. This rule only applies in the function_definition production; see below in the grammar.

NAME is any word that is not reserved, and is a valid name.


7. This rule only applies in the for_clause production; see below in the grammar.

NAME is any valid name.


8. This rule only applies in pattern_not_esac productions; see below in the grammar.

WORD is any word except 'esac'.


9. This rule only applies in here_end production; see below in the grammar.

Quote removal shall be applied to the word to determine the delimiter that is used to find the end of the here-document that begins after the next <newline>.


10. This rule only applies in the filename production; see below in the grammar.

The expansions specified in Redirection shall occur. WORD occurs, if as specified there, exactly one field results (or the result is unspecified), and there are additional requirements on pathname expansion.


11. WORD is any word.
------------------------------

The WORD tokens shall have the word expansion rules applied to them immediately before the associated command is executed, not at the time the command is parsed.


/* -------------------------------------------------------
   The grammar symbols
   ------------------------------------------------------- */
%token WORD
%token ASSIGNMENT_WORD
%token NAME
%token NEWLINE
%token IO_NUMBER


/* The following are the operators (see XBD Operator)
   containing more than one character. */



%token AND_IF OR_IF DSEMI
/* '&&' '||' ';;' */


%token DLESS DGREAT LESSAND GREATAND LESSGREAT DLESSDASH
/* '<<' '>>' '<&' '>&' '<>' '<<-' */


%token CLOBBER
/* '>|' */


/* The following are the reserved words. */


%token If Then Else Elif Fi Do Done
/* 'if' 'then' 'else' 'elif' 'fi' 'do' 'done' */


%token Case Esac While Until For
/* 'case' 'esac' 'while' 'until' 'for' */


/* These are reserved words, not operator tokens, and are
   recognized when reserved words are recognized. */


%token Lbrace Rbrace Bang
/* '{' '}' '!' */


%token In
/* 'in' */


/* -------------------------------------------------------
   The Grammar
   ------------------------------------------------------- */
%start program
%%
program : linebreak complete_commands linebreak
                 | linebreak
                 ;
complete_commands: complete_commands newline_list complete_command
                 | complete_command
                 ;
complete_command : list separator_op
                 | list
                 ;
list : list separator_op and_or
                 | and_or
                 ;
and_or : pipeline
                 | and_or AND_IF linebreak pipeline
                 | and_or OR_IF linebreak pipeline
                 ;
pipeline : pipe_sequence
                 | Bang pipe_sequence
                 ;
pipe_sequence : command
                 | pipe_sequence '|' linebreak command
                 ;
command : simple_command
                 | compound_command
                 | compound_command redirect_list
                 | function_definition
                 ;
compound_command : brace_group
                 | subshell
                 | for_clause
                 | case_clause
                 | if_clause
                 | while_clause
                 | until_clause
                 ;
subshell : '(' compound_list ')'
                 ;
compound_list : linebreak term
                 | linebreak term separator
                 ;
term : term separator and_or
                 | and_or
                 ;
/* Apply rule 7:*/
for_clause : For NAME do_group
                 | For NAME sequential_sep do_group
                 | For NAME linebreak In sequential_sep do_group
                 | For NAME linebreak In wordlist sequential_sep do_group
                 ;
wordlist : wordlist WORD
                 | WORD
                 ;
case_clause : Case WORD linebreak In linebreak case_list Esac
                 | Case WORD linebreak In linebreak case_list_ns Esac
                 | Case WORD linebreak In linebreak Esac
                 ;
case_list_ns : case_list case_item_ns
                 | case_item_ns
                 ;
case_list : case_list case_item
                 | case_item
                 ;
case_item_ns : pattern_not_esac ')' linebreak
                 | pattern_not_esac ')' compound_list
                 | '(' pattern ')' linebreak
                 | '(' pattern ')' compound_list
                 ;
case_item : pattern_not_esac ')' linebreak DSEMI linebreak
                 | pattern_not_esac ')' compound_list DSEMI linebreak
                 | '(' pattern ')' linebreak DSEMI linebreak
                 | '(' pattern ')' compound_list DSEMI linebreak
                 ;
/* Apply rule 8:*/
pattern_not_esac: WORD
         | WORD '|' pattern
         ;
pattern : WORD
                 | pattern '|' WORD
                 ;
if_clause : If compound_list Then compound_list else_part Fi
                 | If compound_list Then compound_list Fi
                 ;
else_part : Elif compound_list Then compound_list
                 | Elif compound_list Then compound_list else_part
                 | Else compound_list
                 ;
while_clause : While compound_list do_group
                 ;
until_clause : Until compound_list do_group
                 ;
/* Apply rule 6:*/
function_definition : NAME '(' ')' linebreak function_body
                 ;
/* Apply rule 3:*/
function_body : compound_command
                 | compound_command redirect_list
                 ;
brace_group : Lbrace compound_list Rbrace
                 ;
do_group : Do compound_list Done
                 ;
simple_command : cmd_prefix WORD cmd_suffix /* Apply rule 5b */
                 | cmd_prefix WORD /* Apply rule 5b */
                 | cmd_prefix
                 | WORD cmd_suffix /* Apply rule 5a */
                 | WORD /* Apply rule 5a */
                 ;
/* Apply rule 5c:*/
cmd_prefix : io_redirect
                 | cmd_prefix io_redirect
                 | ASSIGNMENT_WORD
                 | cmd_prefix ASSIGNMENT_WORD
                 ;
cmd_suffix : io_redirect
                 | cmd_suffix io_redirect
                 | WORD
                 | cmd_suffix WORD
                 ;
redirect_list : io_redirect
                 | redirect_list io_redirect
                 ;
io_redirect : io_file
                 | IO_NUMBER io_file
                 | io_here
                 | IO_NUMBER io_here
                 ;
io_file : '<' filename
                 | LESSAND filename
                 | '>' filename
                 | GREATAND filename
                 | DGREAT filename
                 | LESSGREAT filename
                 | CLOBBER filename
                 ;
filename : WORD /* Apply rule 10*/
                 ;
io_here : DLESS here_end
                 | DLESSDASH here_end
                 ;
here_end : WORD /* Apply rule 9 */
                 ;
newline_list : NEWLINE
                 | newline_list NEWLINE
                 ;
linebreak : newline_list
                 | /* empty */
                 ;
separator_op : '&'
                 | ';'
                 ;
separator : separator_op linebreak
                 | newline_list
                 ;
sequential_sep : ';' linebreak
                 | newline_list
                 ;
Tags No tags attached.
Attached Files

- Relationships
has duplicate 0001098Closed 1003.1(2016/18)/Issue7+TC2 do_group symbol cannot be accepted as written, because rule 6 cannot yield Done token 
has duplicate 0001088Closed 1003.1(2016/18)/Issue7+TC2 "When more than one rule applies, the highest numbered rule shall apply " is pointless 
has duplicate 0001091Closed 1003.1(2016/18)/Issue7+TC2 Some "WORD tokens" do not have "the associated command" 
has duplicate 0001093Closed 1003.1(2016/18)/Issue7+TC2 "or applies globally" is pointless 
related to 0001082Closed 1003.1(2016/18)/Issue7+TC2 "delimited" is incorrect 
related to 0001083Applied 1003.1(2016/18)/Issue7+TC2 "next" character is misleading 
related to 0001084Resolved 1003.1(2016/18)/Issue7+TC2 rule 3, 4, 5 do not say that a token is started, if needed 
related to 0001085Applied 1003.1(2016/18)/Issue7+TC2 "token shall be from the current position in the input" is incorrect 
related to 0001086Closed 1003.1(2016/18)/Issue7+TC2 Token "Recognition" is misleading and the usage of "word" in that section should be clarified. 
related to 0001276Applied 1003.1(2013)/Issue7+TC1 incorrect resolution in 0000839 

-  Notes
(0003470)
Mark_Galeck (reporter)
2016-10-27 12:57

In report 1098, shware_systems wrote the note 3457, and I don't understand parts of that note. I asked several questions, but they did not respond yet.

Report 1098 is now intended to be included here.

Once shware_systems responds, I will see if there is anything I need to fix, and then I will fix those things here.
(0003944)
kre (reporter)
2018-03-28 03:59

I have not yet (after all this time) been able to find the time to see
if the reworded section is correct or not.... (which also means I have
not discovered any cases where it is incorrect.)

But I do have 2 comments - both of which relate to other issues I
believe.

First the proposed rule 9, "next <newline>" is not nearly specific (or
correct) enough to be useful. See issue 1043 (still unresolved...)

Second, in proposed rule 6, and the grammar production for function_definition
there is absolutely no reason for the function name to be a NAME rather than
a word -in fact it should not be. Aside from (perhaps) disallowing '/' in
function names (as such a function can never be executed because of the
command search and execution rules) anything that can be a filesystem command
name should be able to be a function name (including characters that need to be
quoted to be entered without meaning something different, like white space and
the operator and quoting characters). Most shells implement this already.
(I kind of remember a bug report for this, but cannot find it now.)
(0004030)
shware_systems (reporter)
2018-05-11 20:10

Re: Note 3457/3470
>This isn't obvious if one is thinking the grammar matches yacc or another standard's production style.
As Geoff pointed out, the Introduction to Shell & Utilities, says

"The grammar is based on the syntax used by the yacc utility."

QUESTION. Are you contradicting that? Please explain.

No, it is not a contradiction, as "based on" is not "matches" or "equivalent to", as is explained after that quote in XCU.

My examples follow from the paragraph at XCU 2.9.1, Line 75534 - that there is a defined behavior when no command name is present... The grammar or note 7 may not reflect this properly, but I don't see that any shell should be reporting it as a syntax error; since no I/O redirects are specified that might affect the sub-shell being setup it's simply a no-op that uses up some time, that I see.
(0004031)
kre (reporter)
2018-05-11 21:39

Mark (Galeck) - forget note 3457 (issue 1098) - it is 90% gibberish, and
has essentially no relation to anything.

The point about the notes that you and Geoff were disagreeing about is
due to a misunderstanding about how they work - which should be clarified.

When a rule says "apply rule N" it is not intended to mean that rule N
applies here, and nothing else does, what it means is to look and see
if rule N can apply here, and if it can, apply it. Otherwise rule 1 applies.

Rule 6 applies only when a for or case statement is being parsed (which must
be a for statement for the do_group rule - case does not use do_group - rule 6
gets used there via the production for "in") and it only applies to the third
word of that statement. Its purpose is so that

for WORD *in*
for word *do*

can recognise in or do as a reserved word (In or Do) and not just a word
(the asterisks are just for emphasis here).

That (and the "case word in" via the in: In production) are the only
times rule 6 ever applies, it is never used anywhere else, that's what
the:

     6. [Third word of for and case]

is all about. And then either 6a or 6b applies depending upon whether
the first word was "case" or "for" resp.

Done is recognised by rule 1, only when it appears in a command name
position (certainly not anywhere in a do_group) so, for example in

    for x in a b c; do echo done; done
similarly
    for x in do ; do echo $x; done
the first "do" is WORD, as it is not the 3rd word, so rule 6 does not
apply to it (and it is not the command word, so rule 1 does not apply either).
but
    for x do echo done; done
the "do" is "Do" as that is the 3rd word of a for, so rule 6 does apply
(and once it does, rule 1 is irrelevant to that token.)

the first "done" parses as WORD as it is not in the command name position
("echo" is there) but after the ';' we have a new command name next, rule
1 applies, and "done" produces Done

The grammar you are suggesting seems to do things a different way, with
a rule applying for a whole production. I suspect a change like that is
more than we would want to make - better to just clarify better exactly
what "apply rule N" means, and the conditions upon that.
(0004032)
shware_systems (reporter)
2018-05-12 06:59

I'd forget Note 4031: he considers anything he isn't smart enough to understand gibberish, apparently, and likes being rude in the process.
(0004037)
eblake (manager)
2018-05-17 15:33

Here is a diff between the original formal grammar and the proposed new one:

--- /tmp/grammar.1 2018-05-10 09:11:35.894306140 -0700
+++ /tmp/grammar.2 2018-05-10 09:12:23.347012514 -0700
@@ -96,21 +96,18 @@
 term : term separator and_or
                  | and_or
                  ;
-for_clause : For name do_group
- | For name sequential_sep do_group
- | For name linebreak in sequential_sep do_group
- | For name linebreak in wordlist sequential_sep do_group
- ;
-name : NAME /* Apply rule 5 */
- ;
-in : In /* Apply rule 6 */
+/* Apply rule 7:*/
+for_clause : For NAME do_group
+ | For NAME sequential_sep do_group
+ | For NAME linebreak In sequential_sep do_group
+ | For NAME linebreak In wordlist sequential_sep do_group
                  ;
 wordlist : wordlist WORD
                  | WORD
                  ;
-case_clause : Case WORD linebreak in linebreak case_list Esac
- | Case WORD linebreak in linebreak case_list_ns Esac
- | Case WORD linebreak in linebreak Esac
+case_clause : Case WORD linebreak In linebreak case_list Esac
+ | Case WORD linebreak In linebreak case_list_ns Esac
+ | Case WORD linebreak In linebreak Esac
                  ;
 case_list_ns : case_list case_item_ns
                  | case_item_ns
@@ -118,18 +115,22 @@
 case_list : case_list case_item
                  | case_item
                  ;
-case_item_ns : pattern ')' linebreak
- | pattern ')' compound_list
+case_item_ns : pattern_not_esac ')' linebreak
+ | pattern_not_esac ')' compound_list
                  | '(' pattern ')' linebreak
                  | '(' pattern ')' compound_list
                  ;
-case_item : pattern ')' linebreak DSEMI linebreak
- | pattern ')' compound_list DSEMI linebreak
+case_item : pattern_not_esac ')' linebreak DSEMI linebreak
+ | pattern_not_esac ')' compound_list DSEMI linebreak
                  | '(' pattern ')' linebreak DSEMI linebreak
                  | '(' pattern ')' compound_list DSEMI linebreak
                  ;
-pattern : WORD /* Apply rule 4 */
- | pattern '|' WORD /* Do not apply rule 4 */
+/* Apply rule 8:*/
+pattern_not_esac: WORD
+ | WORD '|' pattern
+ ;
+pattern : WORD
+ | pattern '|' WORD
                  ;
 if_clause : If compound_list Then compound_list else_part Fi
                  | If compound_list Then compound_list Fi
@@ -142,27 +143,24 @@
                  ;
 until_clause : Until compound_list do_group
                  ;
-function_definition : fname '(' ')' linebreak function_body
- ;
-function_body : compound_command /* Apply rule 9 */
- | compound_command redirect_list /* Apply rule 9 */
+/* Apply rule 6:*/
+function_definition : NAME '(' ')' linebreak function_body
                  ;
-fname : NAME /* Apply rule 8 */
+/* Apply rule 3:*/
+function_body : compound_command
+ | compound_command redirect_list
                  ;
 brace_group : Lbrace compound_list Rbrace
                  ;
-do_group : Do compound_list Done /* Apply rule 6 */
+do_group : Do compound_list Done
                  ;
-simple_command : cmd_prefix cmd_word cmd_suffix
- | cmd_prefix cmd_word
+simple_command : cmd_prefix WORD cmd_suffix /* Apply rule 5b */
+ | cmd_prefix WORD /* Apply rule 5b */
                  | cmd_prefix
- | cmd_name cmd_suffix
- | cmd_name
- ;
-cmd_name : WORD /* Apply rule 7a */
- ;
-cmd_word : WORD /* Apply rule 7b */
+ | WORD cmd_suffix /* Apply rule 5a */
+ | WORD /* Apply rule 5a */
                  ;
+/* Apply rule 5c:*/
 cmd_prefix : io_redirect
                  | cmd_prefix io_redirect
                  | ASSIGNMENT_WORD
@@ -189,12 +187,12 @@
                  | LESSGREAT filename
                  | CLOBBER filename
                  ;
-filename : WORD /* Apply rule 2 */
+filename : WORD /* Apply rule 10*/
                  ;
 io_here : DLESS here_end
                  | DLESSDASH here_end
                  ;
-here_end : WORD /* Apply rule 3 */
+here_end : WORD /* Apply rule 9 */
                  ;
 newline_list : NEWLINE
                  | newline_list NEWLINE
@@ -211,4 +209,3 @@
 sequential_sep : ';' linebreak
                  | newline_list
                  ;
-
(0004038)
Don Cragun (manager)
2018-05-17 15:58

We believe that some of the changes suggested in this bug report reflect a misunderstanding of the grammar as it is presented in the standard rather than problems in the grammar itself. With no rationale for the changes that are being made, no indication of what is intended to be fixed by the changes that have been made, and no definitions for new terms that have been added to the grammar and the description of the grammar, we are unable to determine which, if any, of the suggested changes should be made.

We believe that there may be discrepancies between the grammar as it currently appears in the standard and the shell language described by the standard, but are unable to determine which, if any, of the changes suggested in this bug report address those problems. We are going to reject this bug report, but would be happy to have the submitter provide another bug report with a list of defects that need to be addressed and a set of changes to meet those defects (with each change identifying the defect it addresses). We would also like to see addtitions to the definitions section for newly defined terms (e.g., "important" <equal-sign> characters) and changes to the rationale in XRAT C.2.10 explaining how the grammar is being changed to reflect differences between what the standard has intended to require and what the grammar currently does require.

When describing problems in the grammar, giving an example of a shell construct that is not accepted by the grammar when it should be or that is accepted by the grammar when it should not be would be a big help in understanding the issues that are being addressed by proposed changes.

Note that existing shells are allowed to support extensions to constructs required by the POSIX shell grammar. Therefore, there is no requirement that all existing shell constructs need to be recognized by the grammar.

- Issue History
Date Modified Username Field Change
2016-10-27 12:40 Mark_Galeck New Issue
2016-10-27 12:40 Mark_Galeck Name => Mark Galeck
2016-10-27 12:40 Mark_Galeck Section => 2.10 Shell Grammar
2016-10-27 12:40 Mark_Galeck Page Number => 2375-2381
2016-10-27 12:40 Mark_Galeck Line Number => 75873-76150
2016-10-27 12:57 Mark_Galeck Note Added: 0003470
2016-10-28 08:19 geoffclare Relationship added related to 0001082
2016-10-28 08:20 geoffclare Relationship added related to 0001083
2016-10-28 08:20 geoffclare Relationship added related to 0001084
2016-10-28 08:21 geoffclare Relationship added related to 0001085
2016-10-28 08:21 geoffclare Relationship added related to 0001086
2016-10-28 08:22 geoffclare Relationship added related to 0001098
2018-03-28 03:59 kre Note Added: 0003944
2018-04-12 15:38 eblake Relationship added has duplicate 0001088
2018-04-12 15:39 eblake Relationship added has duplicate 0001091
2018-04-12 15:39 eblake Relationship added has duplicate 0001093
2018-04-12 15:40 eblake Relationship replaced has duplicate 0001098
2018-05-11 20:10 shware_systems Note Added: 0004030
2018-05-11 21:39 kre Note Added: 0004031
2018-05-12 06:59 shware_systems Note Added: 0004032
2018-05-17 15:33 eblake Note Added: 0004037
2018-05-17 15:58 Don Cragun Note Added: 0004038
2018-05-17 16:03 Don Cragun Interp Status => ---
2018-05-17 16:03 Don Cragun Status New => Closed
2018-05-17 16:03 Don Cragun Resolution Open => Rejected
2019-07-30 14:27 eblake Relationship added related to 0001276


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker