0001100: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

(0003470)
Mark_Galeck (reporter)
2016-10-27 12:57

In report 1098, shware_systems wrote the note 3457, and I don't understand parts of that note. I asked several questions, but they did not respond yet.

Report 1098 is now intended to be included here.

Once shware_systems responds, I will see if there is anything I need to fix, and then I will fix those things here.

(0003944)
kre (reporter)
2018-03-28 03:59

I have not yet (after all this time) been able to find the time to see
if the reworded section is correct or not.... (which also means I have
not discovered any cases where it is incorrect.)

But I do have 2 comments - both of which relate to other issues I
believe.

First the proposed rule 9, "next <newline>" is not nearly specific (or
correct) enough to be useful. See issue 1043 (still unresolved...)

Second, in proposed rule 6, and the grammar production for function_definition
there is absolutely no reason for the function name to be a NAME rather than
a word -in fact it should not be. Aside from (perhaps) disallowing '/' in
function names (as such a function can never be executed because of the
command search and execution rules) anything that can be a filesystem command
name should be able to be a function name (including characters that need to be
quoted to be entered without meaning something different, like white space and
the operator and quoting characters). Most shells implement this already.
(I kind of remember a bug report for this, but cannot find it now.)

(0004030)
shware_systems (reporter)
2018-05-11 20:10

Re: Note 3457/3470
>This isn't obvious if one is thinking the grammar matches yacc or another standard's production style.
As Geoff pointed out, the Introduction to Shell & Utilities, says

"The grammar is based on the syntax used by the yacc utility."

QUESTION. Are you contradicting that? Please explain.

No, it is not a contradiction, as "based on" is not "matches" or "equivalent to", as is explained after that quote in XCU.

My examples follow from the paragraph at XCU 2.9.1, Line 75534 - that there is a defined behavior when no command name is present... The grammar or note 7 may not reflect this properly, but I don't see that any shell should be reporting it as a syntax error; since no I/O redirects are specified that might affect the sub-shell being setup it's simply a no-op that uses up some time, that I see.

(0004031)
kre (reporter)
2018-05-11 21:39

Mark (Galeck) - forget note 3457 (issue 1098) - it is 90% gibberish, and
has essentially no relation to anything.

The point about the notes that you and Geoff were disagreeing about is
due to a misunderstanding about how they work - which should be clarified.

When a rule says "apply rule N" it is not intended to mean that rule N
applies here, and nothing else does, what it means is to look and see
if rule N can apply here, and if it can, apply it. Otherwise rule 1 applies.

Rule 6 applies only when a for or case statement is being parsed (which must
be a for statement for the do_group rule - case does not use do_group - rule 6
gets used there via the production for "in") and it only applies to the third
word of that statement. Its purpose is so that

for WORD *in*
for word *do*

can recognise in or do as a reserved word (In or Do) and not just a word
(the asterisks are just for emphasis here).

That (and the "case word in" via the in: In production) are the only
times rule 6 ever applies, it is never used anywhere else, that's what
the:

     6. [Third word of for and case]

is all about. And then either 6a or 6b applies depending upon whether
the first word was "case" or "for" resp.

Done is recognised by rule 1, only when it appears in a command name
position (certainly not anywhere in a do_group) so, for example in

    for x in a b c; do echo done; done
similarly
    for x in do ; do echo $x; done
the first "do" is WORD, as it is not the 3rd word, so rule 6 does not
apply to it (and it is not the command word, so rule 1 does not apply either).
but
    for x do echo done; done
the "do" is "Do" as that is the 3rd word of a for, so rule 6 does apply
(and once it does, rule 1 is irrelevant to that token.)

the first "done" parses as WORD as it is not in the command name position
("echo" is there) but after the ';' we have a new command name next, rule
1 applies, and "done" produces Done

The grammar you are suggesting seems to do things a different way, with
a rule applying for a whole production. I suspect a change like that is
more than we would want to make - better to just clarify better exactly
what "apply rule N" means, and the conditions upon that.

(0004032)
shware_systems (reporter)
2018-05-12 06:59

I'd forget Note 4031: he considers anything he isn't smart enough to understand gibberish, apparently, and likes being rude in the process.

(0004037)
eblake (manager)
2018-05-17 15:33

Here is a diff between the original formal grammar and the proposed new one:

--- /tmp/grammar.1 2018-05-10 09:11:35.894306140 -0700
+++ /tmp/grammar.2 2018-05-10 09:12:23.347012514 -0700
@@ -96,21 +96,18 @@
term : term separator and_or
                  | and_or
                  ;
-for_clause : For name do_group
- | For name sequential_sep do_group
- | For name linebreak in sequential_sep do_group
- | For name linebreak in wordlist sequential_sep do_group
- ;
-name : NAME /* Apply rule 5 */
- ;
-in : In /* Apply rule 6 */
+/* Apply rule 7:*/
+for_clause : For NAME do_group
+ | For NAME sequential_sep do_group
+ | For NAME linebreak In sequential_sep do_group
+ | For NAME linebreak In wordlist sequential_sep do_group
                  ;
wordlist : wordlist WORD
                  | WORD
                  ;
-case_clause : Case WORD linebreak in linebreak case_list Esac
- | Case WORD linebreak in linebreak case_list_ns Esac
- | Case WORD linebreak in linebreak Esac
+case_clause : Case WORD linebreak In linebreak case_list Esac
+ | Case WORD linebreak In linebreak case_list_ns Esac
+ | Case WORD linebreak In linebreak Esac
                  ;
case_list_ns : case_list case_item_ns
                  | case_item_ns
@@ -118,18 +115,22 @@
case_list : case_list case_item
                  | case_item
                  ;
-case_item_ns : pattern ')' linebreak
- | pattern ')' compound_list
+case_item_ns : pattern_not_esac ')' linebreak
+ | pattern_not_esac ')' compound_list
                  | '(' pattern ')' linebreak
                  | '(' pattern ')' compound_list
                  ;
-case_item : pattern ')' linebreak DSEMI linebreak
- | pattern ')' compound_list DSEMI linebreak
+case_item : pattern_not_esac ')' linebreak DSEMI linebreak
+ | pattern_not_esac ')' compound_list DSEMI linebreak
                  | '(' pattern ')' linebreak DSEMI linebreak
                  | '(' pattern ')' compound_list DSEMI linebreak
                  ;
-pattern : WORD /* Apply rule 4 */
- | pattern '|' WORD /* Do not apply rule 4 */
+/* Apply rule 8:*/
+pattern_not_esac: WORD
+ | WORD '|' pattern
+ ;
+pattern : WORD
+ | pattern '|' WORD
                  ;
if_clause : If compound_list Then compound_list else_part Fi
                  | If compound_list Then compound_list Fi
@@ -142,27 +143,24 @@
                  ;
until_clause : Until compound_list do_group
                  ;
-function_definition : fname '(' ')' linebreak function_body
- ;
-function_body : compound_command /* Apply rule 9 */
- | compound_command redirect_list /* Apply rule 9 */
+/* Apply rule 6:*/
+function_definition : NAME '(' ')' linebreak function_body
                  ;
-fname : NAME /* Apply rule 8 */
+/* Apply rule 3:*/
+function_body : compound_command
+ | compound_command redirect_list
                  ;
brace_group : Lbrace compound_list Rbrace
                  ;
-do_group : Do compound_list Done /* Apply rule 6 */
+do_group : Do compound_list Done
                  ;
-simple_command : cmd_prefix cmd_word cmd_suffix
- | cmd_prefix cmd_word
+simple_command : cmd_prefix WORD cmd_suffix /* Apply rule 5b */
+ | cmd_prefix WORD /* Apply rule 5b */
                  | cmd_prefix
- | cmd_name cmd_suffix
- | cmd_name
- ;
-cmd_name : WORD /* Apply rule 7a */
- ;
-cmd_word : WORD /* Apply rule 7b */
+ | WORD cmd_suffix /* Apply rule 5a */
+ | WORD /* Apply rule 5a */
                  ;
+/* Apply rule 5c:*/
cmd_prefix : io_redirect
                  | cmd_prefix io_redirect
                  | ASSIGNMENT_WORD
@@ -189,12 +187,12 @@
                  | LESSGREAT filename
                  | CLOBBER filename
                  ;
-filename : WORD /* Apply rule 2 */
+filename : WORD /* Apply rule 10*/
                  ;
io_here : DLESS here_end
                  | DLESSDASH here_end
                  ;
-here_end : WORD /* Apply rule 3 */
+here_end : WORD /* Apply rule 9 */
                  ;
newline_list : NEWLINE
                  | newline_list NEWLINE
@@ -211,4 +209,3 @@
sequential_sep : ';' linebreak
                  | newline_list
                  ;
-

(0004038)
Don Cragun (manager)
2018-05-17 15:58

We believe that some of the changes suggested in this bug report reflect a misunderstanding of the grammar as it is presented in the standard rather than problems in the grammar itself. With no rationale for the changes that are being made, no indication of what is intended to be fixed by the changes that have been made, and no definitions for new terms that have been added to the grammar and the description of the grammar, we are unable to determine which, if any, of the suggested changes should be made.

We believe that there may be discrepancies between the grammar as it currently appears in the standard and the shell language described by the standard, but are unable to determine which, if any, of the changes suggested in this bug report address those problems. We are going to reject this bug report, but would be happy to have the submitter provide another bug report with a list of defects that need to be addressed and a set of changes to meet those defects (with each change identifying the defect it addresses). We would also like to see addtitions to the definitions section for newly defined terms (e.g., "important" <equal-sign> characters) and changes to the rationale in XRAT C.2.10 explaining how the grammar is being changed to reflect differences between what the standard has intended to require and what the grammar currently does require.

When describing problems in the grammar, giving an example of a shell construct that is not accepted by the grammar when it should be or that is accepted by the grammar when it should not be would be a big help in understanding the issues that are being addressed by proposed changes.

Note that existing shells are allowed to support extensions to constructs required by the POSIX shell grammar. Therefore, there is no requirement that all existing shell constructs need to be recognized by the grammar.

Issue History
Date Modified	Username	Field	Change
2016-10-27 12:40	Mark_Galeck	New Issue
2016-10-27 12:40	Mark_Galeck	Name	=> Mark Galeck
2016-10-27 12:40	Mark_Galeck	Section	=> 2.10 Shell Grammar
2016-10-27 12:40	Mark_Galeck	Page Number	=> 2375-2381
2016-10-27 12:40	Mark_Galeck	Line Number	=> 75873-76150
2016-10-27 12:57	Mark_Galeck	Note Added: 0003470
2016-10-28 08:19	geoffclare	Relationship added	related to 0001082
2016-10-28 08:20	geoffclare	Relationship added	related to 0001083
2016-10-28 08:20	geoffclare	Relationship added	related to 0001084
2016-10-28 08:21	geoffclare	Relationship added	related to 0001085
2016-10-28 08:21	geoffclare	Relationship added	related to 0001086
2016-10-28 08:22	geoffclare	Relationship added	related to 0001098
2018-03-28 03:59	kre	Note Added: 0003944
2018-04-12 15:38	eblake	Relationship added	has duplicate 0001088
2018-04-12 15:39	eblake	Relationship added	has duplicate 0001091
2018-04-12 15:39	eblake	Relationship added	has duplicate 0001093
2018-04-12 15:40	eblake	Relationship replaced	has duplicate 0001098
2018-05-11 20:10	shware_systems	Note Added: 0004030
2018-05-11 21:39	kre	Note Added: 0004031
2018-05-12 06:59	shware_systems	Note Added: 0004032
2018-05-17 15:33	eblake	Note Added: 0004037
2018-05-17 15:58	Don Cragun	Note Added: 0004038
2018-05-17 16:03	Don Cragun	Interp Status	=> ---
2018-05-17 16:03	Don Cragun	Status	New => Closed
2018-05-17 16:03	Don Cragun	Resolution	Open => Rejected
2019-07-30 14:27	eblake	Relationship added	related to 0001276

Notes
(0003470) Mark_Galeck (reporter) 2016-10-27 12:57	In report 1098, shware_systems wrote the note 3457, and I don't understand parts of that note. I asked several questions, but they did not respond yet. Report 1098 is now intended to be included here. Once shware_systems responds, I will see if there is anything I need to fix, and then I will fix those things here.

(0003944) kre (reporter) 2018-03-28 03:59	I have not yet (after all this time) been able to find the time to see if the reworded section is correct or not.... (which also means I have not discovered any cases where it is incorrect.) But I do have 2 comments - both of which relate to other issues I believe. First the proposed rule 9, "next <newline>" is not nearly specific (or correct) enough to be useful. See issue 1043 (still unresolved...) Second, in proposed rule 6, and the grammar production for function_definition there is absolutely no reason for the function name to be a NAME rather than a word -in fact it should not be. Aside from (perhaps) disallowing '/' in function names (as such a function can never be executed because of the command search and execution rules) anything that can be a filesystem command name should be able to be a function name (including characters that need to be quoted to be entered without meaning something different, like white space and the operator and quoting characters). Most shells implement this already. (I kind of remember a bug report for this, but cannot find it now.)

(0004030) shware_systems (reporter) 2018-05-11 20:10	Re: Note 3457/3470 >This isn't obvious if one is thinking the grammar matches yacc or another standard's production style. As Geoff pointed out, the Introduction to Shell & Utilities, says "The grammar is based on the syntax used by the yacc utility." QUESTION. Are you contradicting that? Please explain. No, it is not a contradiction, as "based on" is not "matches" or "equivalent to", as is explained after that quote in XCU. My examples follow from the paragraph at XCU 2.9.1, Line 75534 - that there is a defined behavior when no command name is present... The grammar or note 7 may not reflect this properly, but I don't see that any shell should be reporting it as a syntax error; since no I/O redirects are specified that might affect the sub-shell being setup it's simply a no-op that uses up some time, that I see.

(0004031) kre (reporter) 2018-05-11 21:39	Mark (Galeck) - forget note 3457 (issue 1098) - it is 90% gibberish, and has essentially no relation to anything. The point about the notes that you and Geoff were disagreeing about is due to a misunderstanding about how they work - which should be clarified. When a rule says "apply rule N" it is not intended to mean that rule N applies here, and nothing else does, what it means is to look and see if rule N can apply here, and if it can, apply it. Otherwise rule 1 applies. Rule 6 applies only when a for or case statement is being parsed (which must be a for statement for the do_group rule - case does not use do_group - rule 6 gets used there via the production for "in") and it only applies to the third word of that statement. Its purpose is so that for WORD in for word do can recognise in or do as a reserved word (In or Do) and not just a word (the asterisks are just for emphasis here). That (and the "case word in" via the in: In production) are the only times rule 6 ever applies, it is never used anywhere else, that's what the: 6. [Third word of for and case] is all about. And then either 6a or 6b applies depending upon whether the first word was "case" or "for" resp. Done is recognised by rule 1, only when it appears in a command name position (certainly not anywhere in a do_group) so, for example in for x in a b c; do echo done; done similarly for x in do ; do echo $x; done the first "do" is WORD, as it is not the 3rd word, so rule 6 does not apply to it (and it is not the command word, so rule 1 does not apply either). but for x do echo done; done the "do" is "Do" as that is the 3rd word of a for, so rule 6 does apply (and once it does, rule 1 is irrelevant to that token.) the first "done" parses as WORD as it is not in the command name position ("echo" is there) but after the ';' we have a new command name next, rule 1 applies, and "done" produces Done The grammar you are suggesting seems to do things a different way, with a rule applying for a whole production. I suspect a change like that is more than we would want to make - better to just clarify better exactly what "apply rule N" means, and the conditions upon that.

(0004032) shware_systems (reporter) 2018-05-12 06:59	I'd forget Note 4031: he considers anything he isn't smart enough to understand gibberish, apparently, and likes being rude in the process.

(0004037) eblake (manager) 2018-05-17 15:33	Here is a diff between the original formal grammar and the proposed new one: --- /tmp/grammar.1 2018-05-10 09:11:35.894306140 -0700 +++ /tmp/grammar.2 2018-05-10 09:12:23.347012514 -0700 @@ -96,21 +96,18 @@ term : term separator and_or \| and_or ; -for_clause : For name do_group - \| For name sequential_sep do_group - \| For name linebreak in sequential_sep do_group - \| For name linebreak in wordlist sequential_sep do_group - ; -name : NAME /* Apply rule 5 / - ; -in : In / Apply rule 6 / +/ Apply rule 7:/ +for_clause : For NAME do_group + \| For NAME sequential_sep do_group + \| For NAME linebreak In sequential_sep do_group + \| For NAME linebreak In wordlist sequential_sep do_group ; wordlist : wordlist WORD \| WORD ; -case_clause : Case WORD linebreak in linebreak case_list Esac - \| Case WORD linebreak in linebreak case_list_ns Esac - \| Case WORD linebreak in linebreak Esac +case_clause : Case WORD linebreak In linebreak case_list Esac + \| Case WORD linebreak In linebreak case_list_ns Esac + \| Case WORD linebreak In linebreak Esac ; case_list_ns : case_list case_item_ns \| case_item_ns @@ -118,18 +115,22 @@ case_list : case_list case_item \| case_item ; -case_item_ns : pattern ')' linebreak - \| pattern ')' compound_list +case_item_ns : pattern_not_esac ')' linebreak + \| pattern_not_esac ')' compound_list \| '(' pattern ')' linebreak \| '(' pattern ')' compound_list ; -case_item : pattern ')' linebreak DSEMI linebreak - \| pattern ')' compound_list DSEMI linebreak +case_item : pattern_not_esac ')' linebreak DSEMI linebreak + \| pattern_not_esac ')' compound_list DSEMI linebreak \| '(' pattern ')' linebreak DSEMI linebreak \| '(' pattern ')' compound_list DSEMI linebreak ; -pattern : WORD / Apply rule 4 / - \| pattern '\|' WORD / Do not apply rule 4 / +/ Apply rule 8:/ +pattern_not_esac: WORD + \| WORD '\|' pattern + ; +pattern : WORD + \| pattern '\|' WORD ; if_clause : If compound_list Then compound_list else_part Fi \| If compound_list Then compound_list Fi @@ -142,27 +143,24 @@ ; until_clause : Until compound_list do_group ; -function_definition : fname '(' ')' linebreak function_body - ; -function_body : compound_command / Apply rule 9 / - \| compound_command redirect_list / Apply rule 9 / +/ Apply rule 6:/ +function_definition : NAME '(' ')' linebreak function_body ; -fname : NAME / Apply rule 8 / +/ Apply rule 3:/ +function_body : compound_command + \| compound_command redirect_list ; brace_group : Lbrace compound_list Rbrace ; -do_group : Do compound_list Done / Apply rule 6 / +do_group : Do compound_list Done ; -simple_command : cmd_prefix cmd_word cmd_suffix - \| cmd_prefix cmd_word +simple_command : cmd_prefix WORD cmd_suffix / Apply rule 5b / + \| cmd_prefix WORD / Apply rule 5b / \| cmd_prefix - \| cmd_name cmd_suffix - \| cmd_name - ; -cmd_name : WORD / Apply rule 7a / - ; -cmd_word : WORD / Apply rule 7b / + \| WORD cmd_suffix / Apply rule 5a / + \| WORD / Apply rule 5a / ; +/ Apply rule 5c:/ cmd_prefix : io_redirect \| cmd_prefix io_redirect \| ASSIGNMENT_WORD @@ -189,12 +187,12 @@ \| LESSGREAT filename \| CLOBBER filename ; -filename : WORD / Apply rule 2 / +filename : WORD / Apply rule 10/ ; io_here : DLESS here_end \| DLESSDASH here_end ; -here_end : WORD / Apply rule 3 / +here_end : WORD / Apply rule 9 */ ; newline_list : NEWLINE \| newline_list NEWLINE @@ -211,4 +209,3 @@ sequential_sep : ';' linebreak \| newline_list ; -

(0004038) Don Cragun (manager) 2018-05-17 15:58	We believe that some of the changes suggested in this bug report reflect a misunderstanding of the grammar as it is presented in the standard rather than problems in the grammar itself. With no rationale for the changes that are being made, no indication of what is intended to be fixed by the changes that have been made, and no definitions for new terms that have been added to the grammar and the description of the grammar, we are unable to determine which, if any, of the suggested changes should be made. We believe that there may be discrepancies between the grammar as it currently appears in the standard and the shell language described by the standard, but are unable to determine which, if any, of the changes suggested in this bug report address those problems. We are going to reject this bug report, but would be happy to have the submitter provide another bug report with a list of defects that need to be addressed and a set of changes to meet those defects (with each change identifying the defect it addresses). We would also like to see addtitions to the definitions section for newly defined terms (e.g., "important" <equal-sign> characters) and changes to the rationale in XRAT C.2.10 explaining how the grammar is being changed to reflect differences between what the standard has intended to require and what the grammar currently does require. When describing problems in the grammar, giving an example of a shell construct that is not accepted by the grammar when it should be or that is accepted by the grammar when it should not be would be a big help in understanding the issues that are being addressed by proposed changes. Note that existing shells are allowed to support extensions to constructs required by the POSIX shell grammar. Therefore, there is no requirement that all existing shell constructs need to be recognized by the grammar.

Aardvark Mark IV