Austin Group Defect Tracker

Aardvark Mark IV

Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001279 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Objection Error 2019-08-03 22:20 2019-08-22 07:15
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Stephane Chazelas
User Reference
Section Shell grammar
Page Number
Line Number
Interp Status ---
Final Accepted Text
Summary 0001279: non-name=value should not be an ASSIGNMENT_WORD
Description The sh grammar in the spec tells us that


is to be parsed as a:

 -> complete_commands
 -> complete_command
 -> list
 -> and_or
 -> pipeline
 -> pipe_sequence
 -> command
 -> simple_command
 -> cmd_prefix as an ASSIGNMENT_WORD

(assuming rule 7a is applied, missing in the spec as already
noted in 0001094)

And for:

$(echo x)=d


 -> simple_command
 -> cmd_prefix as an ASSIGNMENT_WORD


 -> simple_command
 -> cmd_name as a WORD

IOW, all those examples above are described in the manual as
"simple commands" in the sh language, with no scope for
implementations to interpret them otherwise.

In all those cases, when it's ASSIGNMENT_WORD, 2.10.2 7b defers to
"2.9.1" for how an assignment is to be performed based on that

Except that 2.9.1 doesn't really say that.

From a var=value ASSIGNMENT_WORD, there's nothing that says
that "var" is the name of the variable to be assigned and
"value" the value to assign to the variable. The only thing that
suggests it is the "Assignment to the name within a returned
ASSIGNMENT_WORD token" in 2.10.2/7b. While that's easy to guess
for "var=value", that's less so for the other examples
above. If anything 7b would say that in var+=value, the "name"
of the variable is "var+".

Those examples should make it obvious that while they are (for
some of them) syntax in the bash/ksh93/zsh languages, they are
not in the sh language. The sh grammar should not identify those
as sh simple commands or assignments.

At best, things like var+=value or var[0]=value should be
*allowed* to be interpreted as the "var+=value" command (like
many sh implementations do), but not *required* to as some
shells like ksh/bash/zsh interpret them as something else,
and certainly *cannot* be interpreted as POSIX sh variable
assignments as those are not valid sh variable names.

Note: another bug report will follow to address [^]
(0001276 and this one are preamble to that).
Desired Action First, apply the 0001094 resolution: append a /* Apply rule
7a */ to the first occurrence of ASSIGNMENT_WORD in the
cmd_prefix production, and /* Apply rule 7b */ to the second one
(7a would also work as there's no reserved word that can be
mistaken for an assignment).

In 7b, ASSIGNMENT_WORDs should only be returned for
var=anything tokens (where "var", before quote removal and before
expansion is a valid "name"). For other TOKENs that contain an
unquoted, not-part-of-expansion equals sign, we should make sure
that no grammar production that references rule 7 would
succeed/match, for instance, by saying that the TOKEN token, or
maybe a new one called UNSPECIFIED to make it clearer shall be

For instance, change 2.10.2/7 (here including a resolution of
0001276) to:

> 7. [Assignment preceding command name]
> a. [When the first word]
> If the TOKEN is exactly a reserved word, the token identifier for that reserved word shall result. Otherwise, 7b shall be applied.
> b. [Not the first word]
> If the TOKEN contains an unquoted (as determined while applying rule 4 from Token Recognition) <equals-sign> character that is not
> part of an embedded parameter expansion, command substitution, or arithmetic expansion construct (as determined while applying rule
> 5 from Token Recognition):
> • If the TOKEN begins with '=', then the token WORD shall be returned.
> • If all the characters in the TOKEN preceding the first such <equals-sign> form a valid name (see XBD Name), the token
> ASSIGNMENT_WORD shall be returned.
> • Otherwise, it is unspecified whether the WORD or UNSPECIFIED token is returned.
> Otherwise, the token WORD shall be returned.
> Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in Simple Commands.

And add a paragraph in 2.10.1 like:

- in the following section, some rules return an UNSPECIFIED
  token. That's a way to make it clear that the resulting token
  cannot possibly satisfy the grammar productions where the
  corresponding rule is referenced.

And then, in 2.9.1, now that an ASSIGNMENT_WORD can only be a
name=value, it's not as critical, but we may still want to
clarify that the part before the first = in the ASSIGNMENT_WORD
is the name of the variable and the part after that = is the value.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
geoffclare (manager)
2019-08-21 11:29

I don't like the suggestion of making it completely unspecified how non-name=... is parsed. All of the examples you give are things that I would naturally expect to be parsed as some kind of assignment if they are not treated as a cmd_word. If they then can't be processed as a valid assignment, this would produce an assignment error (rather than a syntax error).

So the way I would prefer to handle this is to change the text in 7b from:
Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in [xref to 2.9.1].
to something like:
If a returned ASSIGNMENT_WORD token begins with a valid name, assignment of the value after the first <equals-sign> to the name shall occur as specified in [xref to 2.9.1]. If a returned ASSIGNMENT_WORD token does not begin with a valid name, either an unspecified form of assignment shall be performed (for example, assignment to an array element in shells that support array variables as an extension) or a variable assignment error shall occur; see [xref to 2.8.1] for the consequences of these errors.
stephane (reporter)
2019-08-22 07:15

I just realised 0000351 (about [command [-p]] export/readonly treating what looks like ASSIGNMENT_WORD specially) should also be extended here:

$ touch a0=bar
$ dash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=, a0=bar
$ yash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=, a0=bar
$ ksh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=bar, a0=
$ mksh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=bar, a0=
$ zsh --emulate sh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
a=bar, a0=
$ bash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"'
bash: line 0: export: `a[0]': not a valid identifier
a=, a0=

Those ksh/mksh/zsh/bash don't do globbing there.

They do globbing in:

export "$(echo a)"[0]=bar
export *=bar

Maybe that should be handled in a separate bug, maybe the same bug that would address a[foo + bar] tokenisation ( [^] which I said above I would raise when I have the time, as it's the same issue here.

In any case, we should not return ASSIGNMENT_WORD in things like $a=value, *=value, "foo"=bar, as those are all treated as WORD in all export implementations.

I was about to say: "maybe we should change rule 7 here to say that if the part left of the first unquoted = contains quoting or expansion operators then a WORD (as opposed to ASSIGNMENT_WORD or UNSPECIFIED) shall result", but that would not address export a["$var"]=foo.

- Issue History
Date Modified Username Field Change
2019-08-03 22:20 stephane New Issue
2019-08-03 22:20 stephane Name => Stephane Chazelas
2019-08-03 22:20 stephane Section => Shell grammar
2019-08-21 11:29 geoffclare Note Added: 0004533
2019-08-22 07:15 stephane Note Added: 0004535

Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker