0001454: Conflict between "case" description and grammar

(0005241)
kre (reporter)
2021-02-16 20:44

I agree that shells (and my testing also says all but ksh93) work this
way, and have always done so, and consequently as the grammar appears to
allow, should continue working this way .. but I suspect the ksh93 variation
might just be deliberate, rather than simply a bug, in that they might be
attempting to follow the POSIX standard's rules (words), rather than the
actual standard (what everyone (else) implements).

The issue is that I see nothing in the standard currently which allows that
"esac" to be parsed as the Esac (reserved word) token, or not when it is
not following a \n or ; (etc), or a pattern. That is, reserved words
generally (with the specific exceptions called out by references to the
XCU 2.10.2 rules in the grammar) are generally recognised only when in
the command word position (if the thing parsed were to be a simple command).

In "case foo in x) stuff ..." "stuff" is in the command word position, so
a reserved word lookup happens.

On the other hand, in "case foo in stuff..." "stuff" is not in the command
word position (it will be taken as the pattern, and should be followed by
')' or '|') and so is not subject to reserved word lookup. This is what
makes "case word in for) command;; esac" legal, "for" there is not the
reserved word, just a string. All shells accept that (ksh93 included).

But if "for" there is not a reserved word, how could it be if spelled "esac"
instead? Shells do reject "case word in esac) command;; esac", that is,
except for ksh93, which permits it.

So, I while I believe that we should ask ksh93 to follow the real standard,
rather than what is currently in the POSIX text, I also believe that we need
to add some more magic to the grammar, and tokeniser rules, in order to
make this all legitimate.

It isn't as simple as just updating the description in 2.9.4.3.

(0005242)
kre (reporter)
2021-02-16 20:55
edited on: 2021-02-16 21:02

Ignore most of that (Note: 0005241) - I missed rule 4.

However, rule 4 only applies when looking at a pattern. That is
in a case_list (or case_list_ns) as a case_item (or case_item_ns).

The grammar rule in question:
Case WORD linebreak in linebreak Esac
contains no case_list, hence no patterns, hence rule 4 would seem
not to apply.

The other rules always require a case_list[_ns] which always requires
at least one case_item[_ns] and those things always require a ')'
(every single possibility).

So, I still believe that the grammar needs work, even if slightly different
work than I expected in Note: 0005241 . It may be as simple as adding
/* Apply rule 4 */ to the grammar rule line quoted earlier in this note,
but I haven't considered all the ramifications of that yet.

(0005243)
geoffclare (manager)
2021-02-17 10:07

I agree that we should add /* Apply rule 4 */ to that line in the grammar.
Or something technically equivalent - there would be an editorial problem with simply adding it (and removing redundant spaces) because:

Case WORD linebreak in linebreak Esac /* Apply rule 4 */

is two characters too long to fit on the line.

Perhaps we could change all occurrences of "Apply rule X" to just "Rule X"?
The comments only need to make clear which rule it is that applies; it is the wording in 2.10.1 about those comments that specifies how the indicated rules are applied. (The comment that says "Do not apply rule 4" would stay as-is.)

(0005244)
kre (reporter)
2021-02-17 13:18

Unfortunately, upon reflection, it is not quite that simple (ignoring
temporarily the editorial issue, for which just using "Rule N" instead
of "Apply..." would be fine) as rule 4 says that if the word is "esac"
then the Esac token is returned. If that applies to the grammar production
in question, then
case esac in esac
isn't going to parse correctly, as the first "esac" which should be WORD
would instead become Esac and so not match.

Perhaps instead, change all occurrences of :"Esac": (in all the productions)
to "case_end", and add a new production:

case_end: Esac ; /* Apply Rule 4 */

(formatted however is appropriate) which also conveniently side-steps the
editorial issue.

But please consider this carefully, it is a spur of the moment suggestion,
I'm not sure if it might cause other issues.

(0005245)
geoffclare (manager)
2021-02-17 14:26

Re Note: 0005244 I wondered about "case esac in ..." as well when I was writing my previous note, but decided it's not a problem because of this text in 2.10.1:

Some of the productions in the grammar below are annotated with a rule number from the following list. When a TOKEN is seen where one of those annotated productions could be used to reduce the symbol, the applicable rule shall be applied to convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar.

So rule 4 only causes "esac" to be recognised as the token Esac when it appears in the position of an Esac in the grammar, and it therefore doesn't apply to "case esac in ..." because the "esac" there is in the position of WORD in the grammar, not Esac.

(0005246)
kre (reporter)
2021-02-18 10:32

Re Note: 0005245

That raises an additional problem, as (not concerning the case productions
here) that isn't the way things actually work.

   "convert the token identifier type of the TOKEN to a token identifier
    acceptable at that point in the grammar."

isn't what (normally) happens. If it were the command

    do my work

would run the "do" command, as "do" (the reserved word) is not acceptable
at that point in tne grammar, so the TOKEN "do" should have been (according
to that text) turned into a WORD rather tha nthe keywoprd "do".

There isn't a shell around that behaves like that, it is in the command line
position, "do" matches the spelling of the reserved word, Rule 1 applies, the
reserved word is generated, despite not being "acceptable at that point in
the grammar":.

Once we have an explanation for why this analysis isn't right, or a fix for
that text in 2.10.1 we can revisit how to handle the recognition of "ecac"
in that peculiar case statement with no patterns.

(0005252)
geoffclare (manager)
2021-02-25 09:41

New proposed changes based on comments made here and on the mailing list ...

On page 2372 line 74408 section 2.9.4.3, change:

the first one of several patterns

to:

the first one of zero or more patterns

On page 2372 line 75780 section 2.9.4.3, delete the line:

[(] pattern1 ) compound-list ;;

Note to the editor: in Issue 8 draft 1.1 the line to delete has "terminator" instead of ";;" because of the change to allow ";&".

On page 2375 line 75888 section 2.10.1, change:

... convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar.

to:

... convert the token identifier type of the TOKEN to:
The token identifier of the recognized reserved word, for rule 1.

A token identifier acceptable at that point in the grammar, for all other rules.

On page 2379,2380 line 76041,76043,76084-76134 section 2.10.2, change:

/* Apply rule ...

to:

/* Rule ...

On page 2379 line 76058 section 2.10.2, change:

case_item_ns      :     pattern ')' linebreak
                  |     pattern ')' compound_list
                  | '(' pattern ')' linebreak
                  | '(' pattern ')' compound_list
                  ;
case_item         :     pattern ')' linebreak     DSEMI linebreak
                  |     pattern ')' compound_list DSEMI linebreak
                  | '(' pattern ')' linebreak     DSEMI linebreak
                  | '(' pattern ')' compound_list DSEMI linebreak
                  ;
pattern           :             WORD      /* Apply rule 4 */
                  | pattern '|' WORD      /* Do not apply rule 4 */

to:

case_item_ns      : pattern_list ')' linebreak
                  | pattern_list ')' compound_list
                  ;
case_item         : pattern_list ')' linebreak     DSEMI linebreak
                  | pattern_list ')' compound_list DSEMI linebreak
                  ;
pattern_list      :                  WORD /* Rule 4 */
                  |              '(' WORD /* Do not apply rule 4 */
                  | pattern_list '|' WORD /* Do not apply rule 4 */
                  ;

(0005523)
geoffclare (manager)
2021-11-11 16:47

On page 2372 line 75769 section 2.9.4.3, change:

the first one of several patterns (see Section 2.13) that is matched by the string resulting

to:

the first pattern (see Section 2.13), if any are present, that is matched by the string resulting

On page 2372 line 75780 section 2.9.4.3, delete the line:

[(] pattern1 ) compound-list ;;

Note to the editor: in Issue 8 draft 1.1 the line to delete has "terminator" instead of ";;" because of the change to allow ";&".

On page 2375 line 75888 section 2.10.1, change:

... convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar.

to:

... convert the token identifier type of the TOKEN to:
The token identifier of the recognized reserved word, for rule 1.

A token identifier acceptable at that point in the grammar, for all other rules.

On page 2379 line 76058 section 2.10.2, change:

case_item_ns      :     pattern ')' linebreak
                  |     pattern ')' compound_list
                  | '(' pattern ')' linebreak
                  | '(' pattern ')' compound_list
                  ;
case_item         :     pattern ')' linebreak     DSEMI linebreak
                  |     pattern ')' compound_list DSEMI linebreak
                  | '(' pattern ')' linebreak     DSEMI linebreak
                  | '(' pattern ')' compound_list DSEMI linebreak
                  ;
pattern           :             WORD      /* Apply rule 4 */
                  | pattern '|' WORD      /* Do not apply rule 4 */

to:

case_item_ns      : pattern_list ')' linebreak
                  | pattern_list ')' compound_list
                  ;
case_item         : pattern_list ')' linebreak     DSEMI linebreak
                  | pattern_list ')' compound_list DSEMI linebreak
                  ;
pattern_list      :                  WORD /* Apply rule 4 */
                  |              '(' WORD /* Do not apply rule 4 */
                  | pattern_list '|' WORD /* Do not apply rule 4 */
                  ;

Issue History
Date Modified	Username	Field	Change
2021-02-16 16:29	geoffclare	New Issue
2021-02-16 16:29	geoffclare	Name	=> Geoff Clare
2021-02-16 16:29	geoffclare	Organization	=> The Open Group
2021-02-16 16:29	geoffclare	Section	=> 2.9.4.3
2021-02-16 16:29	geoffclare	Page Number	=> 2372
2021-02-16 16:29	geoffclare	Line Number	=> 75780
2021-02-16 16:29	geoffclare	Interp Status	=> ---
2021-02-16 20:44	kre	Note Added: 0005241
2021-02-16 20:55	kre	Note Added: 0005242
2021-02-16 21:02	kre	Note Edited: 0005242
2021-02-17 10:07	geoffclare	Note Added: 0005243
2021-02-17 13:18	kre	Note Added: 0005244
2021-02-17 14:26	geoffclare	Note Added: 0005245
2021-02-18 10:32	kre	Note Added: 0005246
2021-02-25 09:41	geoffclare	Note Added: 0005252
2021-11-11 16:47	geoffclare	Note Added: 0005523
2021-11-11 16:47	Don Cragun	Final Accepted Text	=> See Note: 0005523.
2021-11-11 16:47	Don Cragun	Status	New => Resolved
2021-11-11 16:47	Don Cragun	Resolution	Open => Accepted As Marked
2021-11-11 16:50	Don Cragun	Tag Attached: issue8
2021-11-11 16:50	Don Cragun	Tag Attached: tc3-2008
2021-11-11 16:50	Don Cragun	Tag Detached: tc3-2008
2021-11-11 16:51	Don Cragun	Tag Attached: tc3-2008
2021-11-11 16:51	Don Cragun	Tag Detached: issue8
2021-11-26 15:10	geoffclare	Status	Resolved => Applied

Notes
(0005241) kre (reporter) 2021-02-16 20:44	I agree that shells (and my testing also says all but ksh93) work this way, and have always done so, and consequently as the grammar appears to allow, should continue working this way .. but I suspect the ksh93 variation might just be deliberate, rather than simply a bug, in that they might be attempting to follow the POSIX standard's rules (words), rather than the actual standard (what everyone (else) implements). The issue is that I see nothing in the standard currently which allows that "esac" to be parsed as the Esac (reserved word) token, or not when it is not following a \n or ; (etc), or a pattern. That is, reserved words generally (with the specific exceptions called out by references to the XCU 2.10.2 rules in the grammar) are generally recognised only when in the command word position (if the thing parsed were to be a simple command). In "case foo in x) stuff ..." "stuff" is in the command word position, so a reserved word lookup happens. On the other hand, in "case foo in stuff..." "stuff" is not in the command word position (it will be taken as the pattern, and should be followed by ')' or '\|') and so is not subject to reserved word lookup. This is what makes "case word in for) command;; esac" legal, "for" there is not the reserved word, just a string. All shells accept that (ksh93 included). But if "for" there is not a reserved word, how could it be if spelled "esac" instead? Shells do reject "case word in esac) command;; esac", that is, except for ksh93, which permits it. So, I while I believe that we should ask ksh93 to follow the real standard, rather than what is currently in the POSIX text, I also believe that we need to add some more magic to the grammar, and tokeniser rules, in order to make this all legitimate. It isn't as simple as just updating the description in 2.9.4.3.

(0005242) kre (reporter) 2021-02-16 20:55 edited on: 2021-02-16 21:02	Ignore most of that (Note: 0005241) - I missed rule 4. However, rule 4 only applies when looking at a pattern. That is in a case_list (or case_list_ns) as a case_item (or case_item_ns). The grammar rule in question: Case WORD linebreak in linebreak Esac contains no case_list, hence no patterns, hence rule 4 would seem not to apply. The other rules always require a case_list[_ns] which always requires at least one case_item[_ns] and those things always require a ')' (every single possibility). So, I still believe that the grammar needs work, even if slightly different work than I expected in Note: 0005241 . It may be as simple as adding /* Apply rule 4 */ to the grammar rule line quoted earlier in this note, but I haven't considered all the ramifications of that yet.

(0005243) geoffclare (manager) 2021-02-17 10:07	I agree that we should add /* Apply rule 4 / to that line in the grammar. Or something technically equivalent - there would be an editorial problem with simply adding it (and removing redundant spaces) because: Case WORD linebreak in linebreak Esac / Apply rule 4 */ is two characters too long to fit on the line. Perhaps we could change all occurrences of "Apply rule X" to just "Rule X"? The comments only need to make clear which rule it is that applies; it is the wording in 2.10.1 about those comments that specifies how the indicated rules are applied. (The comment that says "Do not apply rule 4" would stay as-is.)

(0005244) kre (reporter) 2021-02-17 13:18	Unfortunately, upon reflection, it is not quite that simple (ignoring temporarily the editorial issue, for which just using "Rule N" instead of "Apply..." would be fine) as rule 4 says that if the word is "esac" then the Esac token is returned. If that applies to the grammar production in question, then case esac in esac isn't going to parse correctly, as the first "esac" which should be WORD would instead become Esac and so not match. Perhaps instead, change all occurrences of :"Esac": (in all the productions) to "case_end", and add a new production: case_end: Esac ; /* Apply Rule 4 */ (formatted however is appropriate) which also conveniently side-steps the editorial issue. But please consider this carefully, it is a spur of the moment suggestion, I'm not sure if it might cause other issues.

(0005245) geoffclare (manager) 2021-02-17 14:26	Re Note: 0005244 I wondered about "case esac in ..." as well when I was writing my previous note, but decided it's not a problem because of this text in 2.10.1: Some of the productions in the grammar below are annotated with a rule number from the following list. When a TOKEN is seen where one of those annotated productions could be used to reduce the symbol, the applicable rule shall be applied to convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar. So rule 4 only causes "esac" to be recognised as the token Esac when it appears in the position of an Esac in the grammar, and it therefore doesn't apply to "case esac in ..." because the "esac" there is in the position of WORD in the grammar, not Esac.

(0005246) kre (reporter) 2021-02-18 10:32	Re Note: 0005245 That raises an additional problem, as (not concerning the case productions here) that isn't the way things actually work. "convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar." isn't what (normally) happens. If it were the command do my work would run the "do" command, as "do" (the reserved word) is not acceptable at that point in tne grammar, so the TOKEN "do" should have been (according to that text) turned into a WORD rather tha nthe keywoprd "do". There isn't a shell around that behaves like that, it is in the command line position, "do" matches the spelling of the reserved word, Rule 1 applies, the reserved word is generated, despite not being "acceptable at that point in the grammar":. Once we have an explanation for why this analysis isn't right, or a fix for that text in 2.10.1 we can revisit how to handle the recognition of "ecac" in that peculiar case statement with no patterns.

(0005252) geoffclare (manager) 2021-02-25 09:41	New proposed changes based on comments made here and on the mailing list ... On page 2372 line 74408 section 2.9.4.3, change: the first one of several patterns to: the first one of zero or more patterns On page 2372 line 75780 section 2.9.4.3, delete the line: [(] pattern1 ) compound-list ;; Note to the editor: in Issue 8 draft 1.1 the line to delete has "terminator" instead of ";;" because of the change to allow ";&". On page 2375 line 75888 section 2.10.1, change: ... convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar. to: ... convert the token identifier type of the TOKEN to: The token identifier of the recognized reserved word, for rule 1. A token identifier acceptable at that point in the grammar, for all other rules. On page 2379,2380 line 76041,76043,76084-76134 section 2.10.2, change: /* Apply rule ... to: /* Rule ... On page 2379 line 76058 section 2.10.2, change: case_item_ns : pattern ')' linebreak \| pattern ')' compound_list \| '(' pattern ')' linebreak \| '(' pattern ')' compound_list ; case_item : pattern ')' linebreak DSEMI linebreak \| pattern ')' compound_list DSEMI linebreak \| '(' pattern ')' linebreak DSEMI linebreak \| '(' pattern ')' compound_list DSEMI linebreak ; pattern : WORD /* Apply rule 4 / \| pattern '\|' WORD / Do not apply rule 4 / to: case_item_ns : pattern_list ')' linebreak \| pattern_list ')' compound_list ; case_item : pattern_list ')' linebreak DSEMI linebreak \| pattern_list ')' compound_list DSEMI linebreak ; pattern_list : WORD / Rule 4 / \| '(' WORD / Do not apply rule 4 / \| pattern_list '\|' WORD / Do not apply rule 4 */ ;

(0005523) geoffclare (manager) 2021-11-11 16:47	On page 2372 line 75769 section 2.9.4.3, change: the first one of several patterns (see Section 2.13) that is matched by the string resulting to: the first pattern (see Section 2.13), if any are present, that is matched by the string resulting On page 2372 line 75780 section 2.9.4.3, delete the line: [(] pattern1 ) compound-list ;; Note to the editor: in Issue 8 draft 1.1 the line to delete has "terminator" instead of ";;" because of the change to allow ";&". On page 2375 line 75888 section 2.10.1, change: ... convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar. to: ... convert the token identifier type of the TOKEN to: The token identifier of the recognized reserved word, for rule 1. A token identifier acceptable at that point in the grammar, for all other rules. On page 2379 line 76058 section 2.10.2, change: case_item_ns : pattern ')' linebreak \| pattern ')' compound_list \| '(' pattern ')' linebreak \| '(' pattern ')' compound_list ; case_item : pattern ')' linebreak DSEMI linebreak \| pattern ')' compound_list DSEMI linebreak \| '(' pattern ')' linebreak DSEMI linebreak \| '(' pattern ')' compound_list DSEMI linebreak ; pattern : WORD /* Apply rule 4 / \| pattern '\|' WORD / Do not apply rule 4 / to: case_item_ns : pattern_list ')' linebreak \| pattern_list ')' compound_list ; case_item : pattern_list ')' linebreak DSEMI linebreak \| pattern_list ')' compound_list DSEMI linebreak ; pattern_list : WORD / Apply rule 4 / \| '(' WORD / Do not apply rule 4 / \| pattern_list '\|' WORD / Do not apply rule 4 */ ;

Relationships

Aardvark Mark IV