View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001279 | 1003.1(2016/18)/Issue7+TC2 | Shell and Utilities | public | 2019-08-03 22:20 | 2024-06-11 09:08 |
Reporter | stephane | Assigned To | |||
Priority | normal | Severity | Objection | Type | Error |
Status | Closed | Resolution | Accepted As Marked | ||
Name | Stephane Chazelas | ||||
Organization | |||||
User Reference | |||||
Section | Shell grammar | ||||
Page Number | |||||
Line Number | |||||
Interp Status | --- | ||||
Final Accepted Text | 0001279:0005881 | ||||
Summary | 0001279: non-name=value should not be an ASSIGNMENT_WORD | ||||
Description | The sh grammar in the spec tells us that var=value is to be parsed as a: program -> complete_commands -> complete_command -> list -> and_or -> pipeline -> pipe_sequence -> command -> simple_command -> cmd_prefix as an ASSIGNMENT_WORD (assuming rule 7a is applied, missing in the spec as already noted in 0001094) And for: var+=value stéphane=foo var[1]=value a[0].b[c=++e].f=g "a=b"=c $(echo x)=d Either: ... -> simple_command -> cmd_prefix as an ASSIGNMENT_WORD Or: ... -> simple_command -> cmd_name as a WORD IOW, all those examples above are described in the manual as "simple commands" in the sh language, with no scope for implementations to interpret them otherwise. In all those cases, when it's ASSIGNMENT_WORD, 2.10.2 7b defers to "2.9.1" for how an assignment is to be performed based on that ASSIGNMENT_WORD. Except that 2.9.1 doesn't really say that. From a var=value ASSIGNMENT_WORD, there's nothing that says that "var" is the name of the variable to be assigned and "value" the value to assign to the variable. The only thing that suggests it is the "Assignment to the name within a returned ASSIGNMENT_WORD token" in 2.10.2/7b. While that's easy to guess for "var=value", that's less so for the other examples above. If anything 7b would say that in var+=value, the "name" of the variable is "var+". Those examples should make it obvious that while they are (for some of them) syntax in the bash/ksh93/zsh languages, they are not in the sh language. The sh grammar should not identify those as sh simple commands or assignments. At best, things like var+=value or var[0]=value should be *allowed* to be interpreted as the "var+=value" command (like many sh implementations do), but not *required* to as some shells like ksh/bash/zsh interpret them as something else, and certainly *cannot* be interpreted as POSIX sh variable assignments as those are not valid sh variable names. Note: another bug report will follow to address https://www.mail-archive.com/austin-group-l@opengroup.org/msg04563.html (0001276 and this one are preamble to that). | ||||
Desired Action | First, apply the 0001094 resolution: append a /* Apply rule 7a */ to the first occurrence of ASSIGNMENT_WORD in the cmd_prefix production, and /* Apply rule 7b */ to the second one (7a would also work as there's no reserved word that can be mistaken for an assignment). In 7b, ASSIGNMENT_WORDs should only be returned for var=anything tokens (where "var", before quote removal and before expansion is a valid "name"). For other TOKENs that contain an unquoted, not-part-of-expansion equals sign, we should make sure that no grammar production that references rule 7 would succeed/match, for instance, by saying that the TOKEN token, or maybe a new one called UNSPECIFIED to make it clearer shall be returned. For instance, change 2.10.2/7 (here including a resolution of 0001276) to: > 7. [Assignment preceding command name] > > a. [When the first word] > > If the TOKEN is exactly a reserved word, the token identifier for that reserved word shall result. Otherwise, 7b shall be applied. > > b. [Not the first word] > > If the TOKEN contains an unquoted (as determined while applying rule 4 from Token Recognition) <equals-sign> character that is not > part of an embedded parameter expansion, command substitution, or arithmetic expansion construct (as determined while applying rule > 5 from Token Recognition): > > • If the TOKEN begins with '=', then the token WORD shall be returned. > > • If all the characters in the TOKEN preceding the first such <equals-sign> form a valid name (see XBD Name), the token > ASSIGNMENT_WORD shall be returned. > > • Otherwise, it is unspecified whether the WORD or UNSPECIFIED token is returned. > > Otherwise, the token WORD shall be returned. > > Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in Simple Commands. And add a paragraph in 2.10.1 like: - in the following section, some rules return an UNSPECIFIED token. That's a way to make it clear that the resulting token cannot possibly satisfy the grammar productions where the corresponding rule is referenced. And then, in 2.9.1, now that an ASSIGNMENT_WORD can only be a name=value, it's not as critical, but we may still want to clarify that the part before the first = in the ASSIGNMENT_WORD is the name of the variable and the part after that = is the value. | ||||
Tags | tc3-2008 |
|
I don't like the suggestion of making it completely unspecified how non-name=... is parsed. All of the examples you give are things that I would naturally expect to be parsed as some kind of assignment if they are not treated as a cmd_word. If they then can't be processed as a valid assignment, this would produce an assignment error (rather than a syntax error). So the way I would prefer to handle this is to change the text in 7b from: Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in [xref to 2.9.1].to something like: If a returned ASSIGNMENT_WORD token begins with a valid name, assignment of the value after the first <equals-sign> to the name shall occur as specified in [xref to 2.9.1]. If a returned ASSIGNMENT_WORD token does not begin with a valid name, either an unspecified form of assignment shall be performed (for example, assignment to an array element in shells that support array variables as an extension) or a variable assignment error shall occur; see [xref to 2.8.1] for the consequences of these errors. |
|
I just realised 0000351 (about [command [-p]] export/readonly treating what looks like ASSIGNMENT_WORD specially) should also be extended here:$ touch a0=bar $ dash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"' a=, a0=bar $ yash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"' a=, a0=bar $ ksh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"' a=bar, a0= $ mksh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"' a=bar, a0= $ zsh --emulate sh -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"' a=bar, a0= $ bash -c 'export a[0]=bar; printf "%s\n" "a=$a, a0=$a0"' bash: line 0: export: `a[0]': not a valid identifier a=, a0= Those ksh/mksh/zsh/bash don't do globbing there. They do globbing in: export "$(echo a)"[0]=bar or export *=bar Maybe that should be handled in a separate bug, maybe the same bug that would address a[foo + bar] tokenisation (https://www.mail-archive.com/austin-group-l%40opengroup.org/msg04563.html) which I said above I would raise when I have the time, as it's the same issue here. In any case, we should not return ASSIGNMENT_WORD in things like $a=value, *=value, "foo"=bar, as those are all treated as WORD in all export implementations. I was about to say: "maybe we should change rule 7 here to say that if the part left of the first unquoted = contains quoting or expansion operators then a WORD (as opposed to ASSIGNMENT_WORD or UNSPECIFIED) shall result", but that would not address export a["$var"]=foo. |
|
Line numbers are for Issue 8 draft 2.1. Change line 75831 from: Otherwise it is unspecified whether rule 1 is applied or ASSIGNMENT_WORD is returned.to: Otherwise, it is implementation-defined whether rule 1 is applied, ASSIGNMENT_WORD is returned, or the TOKEN is processed in some other way. Change the paragraph at line 75834 from: Assignment to the name within a returned ASSIGNMENT_WORD token shall occur as specified in [xref to 2.9.1].to: If a returned ASSIGNMENT_WORD token begins with a valid name, assignment of the value after the first <equals-sign> to the name shall occur as specified in [xref to 2.9.1]. If a returned ASSIGNMENT_WORD token does not begin with a valid name, the way in which the token is processed is unspecified. |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-08-03 22:20 | stephane | New Issue | |
2019-08-03 22:20 | stephane | Name | => Stephane Chazelas |
2019-08-03 22:20 | stephane | Section | => Shell grammar |
2019-08-21 11:29 | geoffclare | Note Added: 0004533 | |
2019-08-22 07:15 | stephane | Note Added: 0004535 | |
2022-07-11 15:30 | geoffclare | Note Added: 0005881 | |
2022-07-11 15:31 | geoffclare | Note Edited: 0005881 | |
2022-07-11 15:32 | geoffclare | Interp Status | => --- |
2022-07-11 15:32 | geoffclare | Final Accepted Text | => 0001279:0005881 |
2022-07-11 15:32 | geoffclare | Status | New => Resolved |
2022-07-11 15:32 | geoffclare | Resolution | Open => Accepted As Marked |
2022-07-11 15:32 | geoffclare | Tag Attached: tc3-2008 | |
2022-08-05 09:24 | geoffclare | Status | Resolved => Applied |
2024-06-11 09:08 | agadmin | Status | Applied => Closed |