Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001279 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Objection Error 2019-08-03 22:20 2022-08-05 09:24
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Stephane Chazelas
Organization
User Reference
Section Shell grammar
Page Number
Line Number
Interp Status ---
Final Accepted Text Note: 0005881
Summary 0001279: non-name=value should not be an ASSIGNMENT_WORD
Description The sh grammar in the spec tells us that

   var=value

is to be parsed as a:

program
 -> complete_commands
 -> complete_command
 -> list
 -> and_or
 -> pipeline
 -> pipe_sequence
 -> command
 -> simple_command
 -> cmd_prefix as an ASSIGNMENT_WORD

(assuming rule 7a is applied, missing in the spec as already
noted in 0001094)

And for:

var+=value
stéphane=foo
var[1]=value
a[0].b[c=++e].f=g
"a=b"=c
$(echo x)=d

Either:

 ...
 -> simple_command
 -> cmd_prefix as an ASSIGNMENT_WORD

Or:

 ...
 -> simple_command
 -> cmd_name as a WORD


IOW, all those examples above are described in the manual as
"simple commands" in the sh language, with no scope for
implementations to interpret them otherwise.

In all those cases, when it's ASSIGNMENT_WORD, 2.10.2 7b defers to
"2.9.1" for how an assignment is to be performed based on that
ASSIGNMENT_WORD.

Except that 2.9.1 doesn't really say that.

From a var=value ASSIGNMENT_WORD, there's nothing that says
that "var" is the name of the variable to be assigned and
"value" the value to assign to the variable. The only thing that
suggests it is the "Assignment to the name within a returned
ASSIGNMENT_WORD token" in 2.10.2/7b. While that's easy to guess
for "var=value", that's less so for the other examples
above. If anything 7b would say that in var+=value, the "name"
of the variable is "var+".

Those examples should make it obvious that while they are (for
some of them) syntax in the bash/ksh93/zsh languages, they are
not in the sh language. The sh grammar should not identify those
as sh simple commands or assignments.

At best, things like var+=value or var[0]=value should be
*allowed* to be interpreted as the "var+=value" command (like
many sh implementations do), but not *required* to as some
shells like ksh/bash/zsh interpret them as something else,
and certainly *cannot* be interpreted as POSIX sh variable
assignments as those are not valid sh variable names.

Note: another bug report will follow to address
https://www.mail-archive.com/austin-group-l@opengroup.org/msg04563.html [^]
(0001276 and this one are preamble to that).
Desired Action First, apply the 0001094 resolution: append a /* Apply rule
7a */ to the first occurrence of ASSIGNMENT_WORD in the
cmd_prefix production, and /* Apply rule 7b */ to the second one
(7a would also work as there's no reserved word that can be
mistaken for an assignment).

In 7b, ASSIGNMENT_WORDs should only be returned for
var=anything tokens (where "var", before quote removal and before
expansion is a valid "name"). For other TOKENs that contain an
unquoted, not-part-of-expansion equals sign, we should make sure
that no grammar production that references rule 7 would
succeed/match, for instance, by saying that the TOKEN token, or
maybe a new one called UNSPECIFIED to make it clearer shall be
returned.

For instance, change 2.10.2/7 (here including a resolution of
0001276) to:


> 7. [Assignment preceding command name]
>
> a. [When the first word]
>
> If the TOKEN is exactly a reserved word, the token identifier for that reserved word shall result. Otherwise, 7b shall be applied.
>
> b. [Not the first word]
>
> If the TOKEN contains an unquoted (as determined while applying rule 4 from Token Recognition) <equals-sign> character that is not
> part of an embedded parameter expansion, command substitution, or arithmetic expansion construct (as determined while applying rule
> 5 from Token Recognition):
>
> • If the TOKEN begins with '=', then the token WORD shall be returned.
>
> • If all the characters in the TOKEN preceding the first such <equals-sign> form a valid name (see XBD Name), the token
> ASSIGNMENT_WORD shall be returned.
>
> • Otherwise, it is unspecified whether the WORD or UNSPECIFIED token is returned.
>
> Otherwise, the token WORD shall be returned.
>
> Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in Simple Commands.


And add a paragraph in 2.10.1 like:

- in the following section, some rules return an UNSPECIFIED
  token. That's a way to make it clear that the resulting token
  cannot possibly satisfy the grammar productions where the
  corresponding rule is referenced.


And then, in 2.9.1, now that an ASSIGNMENT_WORD can only be a
name=value, it's not as critical, but we may still want to
clarify that the part before the first = in the ASSIGNMENT_WORD
is the name of the variable and the part after that = is the value.
Tags tc3-2008
Attached Files

- Relationships

-  Notes
(0004533)
geoffclare (manager)
2019-08-21 11:29

I don't like the suggestion of making it completely unspecified how non-name=... is parsed. All of the examples you give are things that I would naturally expect to be parsed as some kind of assignment if they are not treated as a cmd_word. If they then can't be processed as a valid assignment, this would produce an assignment error (rather than a syntax error).

So the way I would prefer to handle this is to change the text in 7b from:
Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in [xref to 2.9.1].
to something like:
If a returned ASSIGNMENT_WORD token begins with a valid name, assignment of the value after the first <equals-sign> to the name shall occur as specified in [xref to 2.9.1]. If a returned ASSIGNMENT_WORD token does not begin with a valid name, either an unspecified form of assignment shall be performed (for example, assignment to an array element in shells that support array variables as an extension) or a variable assignment error shall occur; see [xref to 2.8.1] for the consequences of these errors.
(0004535)
stephane (reporter)
2019-08-22 07:15

I just realised 0000351 (about [command [-p]] export/readonly treating what looks like ASSIGNMENT_WORD specially) should also be extended here:

$ touch a0=bar
$ dash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=, a0=bar
$ yash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=, a0=bar
$ ksh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=bar, a0=
$ mksh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=bar, a0=
$ zsh --emulate sh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=bar, a0=
$ bash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
bash: line 0: export: `a[0]': not a valid identifier
a=, a0=


Those ksh/mksh/zsh/bash don't do globbing there.

They do globbing in:

export "$(echo a)"[0]=bar
or
export *=bar

Maybe that should be handled in a separate bug, maybe the same bug that would address a[foo + bar] tokenisation (https://www.mail-archive.com/austin-group-l%40opengroup.org/msg04563.html) [^] which I said above I would raise when I have the time, as it's the same issue here.

In any case, we should not return ASSIGNMENT_WORD in things like $a=value, *=value, "foo"=bar, as those are all treated as WORD in all export implementations.

I was about to say: "maybe we should change rule 7 here to say that if the part left of the first unquoted = contains quoting or expansion operators then a WORD (as opposed to ASSIGNMENT_WORD or UNSPECIFIED) shall result", but that would not address export a["$var"]=foo.
(0005881)
geoffclare (manager)
2022-07-11 15:30
edited on: 2022-07-11 15:31

Line numbers are for Issue 8 draft 2.1.

Change line 75831 from:
Otherwise it is unspecified whether rule 1 is applied or ASSIGNMENT_WORD is returned.
to:
Otherwise, it is implementation-defined whether rule 1 is applied, ASSIGNMENT_WORD is returned, or the TOKEN is processed in some other way.

Change the paragraph at line 75834 from:
Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in [xref to 2.9.1].
to:
If a returned ASSIGNMENT_WORD token begins with a valid name, assignment of the value after the first <equals-sign> to the name shall occur as specified in [xref to 2.9.1]. If a returned ASSIGNMENT_WORD token does not begin with a valid name, the way in which the token is processed is unspecified.



- Issue History
Date Modified Username Field Change
2019-08-03 22:20 stephane New Issue
2019-08-03 22:20 stephane Name => Stephane Chazelas
2019-08-03 22:20 stephane Section => Shell grammar
2019-08-21 11:29 geoffclare Note Added: 0004533
2019-08-22 07:15 stephane Note Added: 0004535
2022-07-11 15:30 geoffclare Note Added: 0005881
2022-07-11 15:31 geoffclare Note Edited: 0005881
2022-07-11 15:32 geoffclare Interp Status => ---
2022-07-11 15:32 geoffclare Final Accepted Text => Note: 0005881
2022-07-11 15:32 geoffclare Status New => Resolved
2022-07-11 15:32 geoffclare Resolution Open => Accepted As Marked
2022-07-11 15:32 geoffclare Tag Attached: tc3-2008
2022-08-05 09:24 geoffclare Status Resolved => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker