Austin Group Defect Tracker

Aardvark Mark III


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000953 [1003.1(2013)/Issue7+TC1] Shell and Utilities Objection Clarification Requested 2015-06-04 00:22 2019-04-17 10:52
Reporter wpollock View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Interpretation Required  
Name Wayne Pollock
Organization
User Reference
Section 2.3.1 Alias Substitution
Page Number 2322
Line Number 73690-73705
Interp Status Approved
Final Accepted Text See Note: 0004214
Summary 0000953: Alias expansion is under-specified
Description It isn't clear whether the results of an alias expansion should trigger a re-tokenization of the following token(s) as well, since some tokens such as pipe ("|") and ampersand ("&") could be part of a larger token, if the alias expansion ends with a pipe or ampersand. Reportedly, different shells behave differently in this regard.
Desired Action Since different shells behave differently on this issue, the resolution should be tagged for issue 8. If a desired behavior can be identified and agreed, this section of the standard should be clarified. If not, a set option should be added to address the issue.
Tags tc3-2008
Attached Files

- Relationships
related to 0000736Closed grammatically accept zero or more Shell commands 
related to 0001048Closed deprecate alias and unalias 
related to 0001055Resolved unspecified how much is parsed before execution begins 

-  Notes
(0002694)
joerg (reporter)
2015-06-04 09:26

It is also not specified how quoting characters should be handled
inside alias replacements after the expansion took place.

We should check all special characters to find out whether there are more
potential problems.
(0003089)
joerg (reporter)
2016-03-04 09:49
edited on: 2016-03-04 15:11

Based on the discussion from yesterday, I like to add some remarks:

In order to understand how aliases work, it is important to understand the data flow inside the shell.

- The parser calls the lexer and the lexer does the alias substitution.

- The outpout from the parser is a binary syntax tree that is later fed
  into the interpreter

- Before the interpreter runs a single command from a larger list
  in the syntax tree it is called with, it calls the macro
  expansion code for that command and that may trigger a recursive
  parser and interpreter call for $(cmd) and `cmd` command
  substitution.

- After macro expansion did take place, the related command is run
  by the interpreter. Such commands may be "alias" and "unalias"
  that modify the alias table in the shell.

An updated or deleted alias is expanded the new way whenever the
parser and lexer are called the next time after the related
alias/unalias command has been run by the interpreter.

It may be wise to add a related text to the standard.

To understand the parser and how much of a specific input is read
by the paser with a single call, the Bourne Shell and ksh
parser needs to be understood:

NLFLG in the Bourne Shell and the same flag renamed to SH_NL
in ksh tell the parser to treat a newline as a semicolon,
and thus to continue parsing until EOF.

ksh and Bourne Shell use the NLFLG when calling the parser
for the following purpose:
    
    1) for command substitution with $(...) and `...`
    2) for the dot command file body
    3) for eval
    4) for trap
    5) for jobs -x
    jobs -x:
        -x Replace any jobid found in command or arguments
               with the corresponding process group ID, and then
               execute command passing it arguments.

In these cases, the interpreter is called after parsing
a larger chunk of commands and no alias command is effective
unless a new parser instance is called.

(0003113)
rhansen (manager)
2016-03-31 16:29
edited on: 2016-03-31 16:40

Interpretation response
------------------------
The standard does not speak to this issue, and as such no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
There is widespread existing practice to follow, and the standard developers have attempted to codify this in the changes below.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
On page 2322 lines 73691-73702 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name. However, reserved words in correct grammatical context shall not be candidates for alias substitution. A valid alias name (see XBD Section 3.10) shall be one that has been defined by the alias utility and not subsequently undefined using unalias. Implementations also may provide predefined valid aliases that are in effect when the shell is invoked. To prevent infinite loops in recursive aliasing, if the shell is not currently processing an alias of the same name, the word shall be replaced by the value of the alias; otherwise, it shall not be replaced.

If the value of the alias replacing the word ends in a <blank>, the shell shall check the next command word for alias substitution; this process shall continue until a word is found that is not a valid alias or an alias value does not end in a <blank>.
to:
After a token has been delimited and has been identified to be the command name word of a simple command by applying the grammatical rules in Section 2.10, the word shall be subject to alias substitution if:
  • the word does not contain any quoting characters,
  • the word is a valid alias name (see XBD Section 3.10),
  • an alias with that name is in effect, and
  • the word is not recognized as a reserved word (see [xref to 2.4 Reserved Words] and the examples in [xref to XRAT C.2.3.1]).

An implementation may defer the effect of a change to an alias but the change shall take effect no later than the completion of the currently executing complete_command (see [xref to XCU 2.10 Shell Grammar]). Changes to aliases shall not take effect out of order. Implementations may provide predefined aliases that are in effect when the shell is invoked.

If the value of the alias cannot be tokenized as a simple command (see Section 2.9.1) according to the shell grammar rules (see Section 2.10), or if the alias can be tokenized as a simple command but contains an ASSIGNMENT_WORD token or redirection, the behavior is unspecified. When a word is subject to alias substitution, the value of the alias shall be tokenized as a simple command and the tokens shall replace the word. To prevent infinite loops in recursive aliasing, if the shell is currently processing an alias of the same name, the word shall not be replaced.

If the value of the alias replacing the word ends in a <blank> that would be unquoted after substitution, the shell shall check the next command word for alias substitution; this process shall continue until a word is found that is not a valid alias or an alias value does not end in a <blank>.

On page 2331 lines 74074-74076 (XCU 2.6.3 Command Substitution), change:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
to:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command.

With both the backquoted and $(command) forms, command shall be tokenized (see [xref to XCU 2.3 Token Recognition]) and parsed as a compound_list (see [xref to XCU 2.10 Shell Grammar]). Any valid compound_list can be used for command, except a compound_list consisting solely of redirections which produces unspecified results.

On page 2325 lines 73782-73785 (XCU 2.5.3 Shell Variables) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the shell shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of the file, parse the tokens as a program (see [xref to XCU 2.10 Shell Grammar]), and execute the resulting commands in the current environment. (In other words, the contents of the ENV file are not parsed as a single compound_list, unlike the contents of a dot script. This distinction matters because it influences when aliases take effect.)

On page 2364 line 75304 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The shell shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of the file, parse the tokens as a compound_list (see [xref to XCU 2.10 Shell Grammar]), and execute the resulting commands in the current environment.

On page 2366 line 75371 (XCU 2.14 eval), change:
The constructed command shall be read and executed by the shell.
to:
The constructed command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), parsed as a compound_list (see [xref to XCU 2.10 Shell Grammar]), and executed by the shell in the current environment.

On page 3679 after line 125734 (XRAT C.2.3.1), insert:
Implementations differ in how alias substitution is performed when the alias value does not have the form of a simple command. For example, given:

$ alias foo='some_command &'
$ foo&

some, but not all, implementations retokenize the two '&' characters into an && (and) operator.


Some, but not all, shell implementations do not process changes to alias definitions until the current <tt>compound_list</tt> (see [xref to XCU 2.10 shell grammar]) has completed. In these shells, alias changes do not take effect until the end of the dot script, eval command, function invocation, if statement, case statement, for statement, while statement, or until statement containing the alias change.

Many shell implementations execute the contents of a file, typically <tt>~/.profile</tt>, when invoked as a login shell. The standard developers are unaware of any such implementations that process the contents of <tt>~/.profile</tt> (and similar startup files) as a single <tt>compound_list</tt>, so alias changes in <tt>~/.profile</tt> typically do take effect before the end of <tt>~/.profile</tt>.


(0003116)
ajosey (manager)
2016-04-01 12:29

Interpretation proposed: 1st April 2016
(0003148)
jilles (reporter)
2016-04-10 22:09

The new restriction that an alias must be tokenizable as a simple command without ASSIGNMENT_WORDs or redirection suggests an implementation that does not expand aliases in the parser but just before executing a simple command. Is it deliberate that this is permitted?

The new restriction also allows removing the restriction on finding the end of $(...) command substitution "not including the alias substitutions in Section 2.3.1," from XCU 2.2.3 Double-Quotes. Since an alias must be tokenizable as a simple command, it does not matter whether aliases are expanded or not for finding the matching closing parenthesis. Note that ash variants such as FreeBSD sh, NetBSD sh and dash fully parse a command substitution when it is encountered during the outer parse.

I cannot find a definition for 'program' in XCU 2.10 Shell Grammar. From the context I derive that "parse the tokens as a program and execute the resulting commands" means "parse complete_commands and execute them as they are parsed, one complete_command at a time". Apart from alias expansion, this affects consequences of syntax errors and the output when 'set -v' is enabled.

The changes to dot and eval require the shell to parse the entire input before executing anything. Although the Bourne shell, the real Korn shell and zsh do this, most other shells parse complete_commands and execute them as they are parsed, one complete_command at a time (FreeBSD sh, mksh, bash in both POSIX and default mode, yash). I tested this using:

sh -c "eval 'echo The next line has an error.
)'"

and

sh -c '. ./witherror.sh'

where witherror.sh contains

#!/bin/sh
echo The next line has an error.
)

To clarify this, the descriptions of dot and eval could explicitly permit parsing both one complete_command at a time and the entire input (compound_list) at once. The new paragraph about deferring changes to aliases can then be replaced by text in XCU 2.1 about parsing complete_commands at a time and the entire input (compound_list) at once. The line "%start complete_command" could have a comment that "%start compound_list" may be used instead in some cases (command substitution, dot, eval, trap).

In practice, a situation like the above example may occur if a sourced script first tests for an extension that a standard shell cannot parse and subsequently uses it. With complete_command at a time parsing, a complete_command [ -z "$BASH_VERSION" ] && return will ensure the rest of the file is not parsed by shells other than bash. Therefore, it is a bad idea to change an existing implementation from parsing a complete_command at a time to the entire input at once.
(0003149)
chet_ramey (reporter)
2016-04-11 14:31

I would also like to see the standard acknowledge existing practice when describing the behavior of dot and eval. If you want to constrain alias expansion, the standard can do so without specifying this kind of different parsing behavior.

Shell behavior is hard enough to explain to users without making yet another distinction between commands read with `.' and commands read directly from the primary input source, especially where aliases are concerned.
(0003150)
shware_systems (reporter)
2016-04-11 20:59

Assignments and redirections were changed to unspecified due to potential for one of these in an alias conflicting with one specified outside the alias and being already tokenized.

I believe the intent of the changes to 2.6.3 is to accentuate grammar productions that reference compound_list are not valid in that context, but how an implementation handles the productions compound_list references is still up to it, as long as the net effect is as if the list was entirely parsed. So an implementation can still do processing a complete_command at a time and early exit where conditions permit, but this defers any changes to aliases as script global entities until after the list has finished executing.

Similar reasoning applies for . and eval, as "utilities" that are effectively wrappers of the compound_list production. Changes to aliases in their lists do not take effect until the utilities terminate. That is how I understood the discussion, anyways.
What a shell may need to add is an internal in_list() and at_list_exit() facility that alias and unalias can test and register changes with directly, or the routine that actually modifies the aliases data these call would do this.
(0003151)
joerg (reporter)
2016-04-12 08:58
edited on: 2016-04-13 17:43

To Note: 0003149

The behavior of the dot command is taking existing practice into account:
For the dot command, the original implementation first parses the file as
a whole and creates a command list that then is executed. As a result,
aliases defined in the file specified as argument to the dot command are
not yet valid inside that file except when they appear inside a $()
expansion.

The other points where we decided to be careful and made some behavior
unspecified are caused by deviating behavior of "bash" that unlike
other implementations inserts a space after the expanded alias and
this makes it hard to be more specific in the standard without being
in conflict with bash.

(0003152)
chet_ramey (reporter)
2016-04-12 12:58

A curious choice: make the little-used corner cases unspecified and over-specify the area of genuine conflict, where there is maximum possibility for problems with existing scripts, in a way that makes multiple implementations non-conformant.
(0003153)
joerg (reporter)
2016-04-12 14:49
edited on: 2016-04-13 09:07

This looks like a missunderstanding:

We tried to make everything unspecified that causes problems (e.g. the
deviating behavior from bash that inserts spaces after the alias
replacement text).

We on the other side mentioned things that are known to cause confusion,
like the fact that at least Bourne Shell and Korn Shell parse the
whole file at once (before executing things) with the dot command.

As a result, we did no more than documenting where existing implementations
behave in a way that cannot grant you a specific behavior.

In other words: scripts that assume a specific behavior in these
documented areas have already been non-portable before the text
in question has been written. These scripts are not non-conformant
because of the new text but because of the existing behavior of
existing shell implementations.

(0003154)
chet_ramey (reporter)
2016-04-13 17:15

Re: note 3153

Please. Did you read Jilles's message in note 3148? There are two distinct ways of parsing the commands for `.' and eval, and the group chose to standardize the ksh88 method (compound_list), making other implementations (complete_command) non-conformant. You can't claim that you "did no more than documenting" existing behavior. I understand that ksh88 is considered the "original implementation", but we're more than twenty years past having both behaviors exist.

The previous text did not specify a parsing mechanism. The new text does: implementations must parse the entire contents of a `.' file before executing any of the commands.

Changing the way . and eval are parsed, which (as note 3148 points out) has consequences beyond alias expansion, is more disruptive than the relatively minor "deviating [alias expansion] behavior from bash" which has little practical effect.
(0003155)
joerg (reporter)
2016-04-14 09:29
edited on: 2016-04-14 09:34

To Note: 0003148

It seems that you believe that the recent POSIX bugfix changed the behavior
of the shell, but this did not happen.

All versions/variants of the Bourne Shell, of ksh88 and of ksh93 behave
this way. The new POSIX text just documents this behavior.

A script that makes different assumptions is non portable and only works
on some divergent clone implementations.

To Note: 0003154

The previous POSIX text did just incompletely document the shell behavior.

Any script that assumes a line by line parsing in the dot command (or in
eval or command substitution) was always non-portable.

Do you like to make the Bourne Shell, ksh88 and ksh93 invalid just to
declare some badly written scripts conforming?

Please note as well that we made some other behavior "undefined"
because there is a deviating behavior in bash. If bash did not
insert a space after the alias replacement text, there were close
to no undefined parts in the new alias definition.

(0003156)
kre (reporter)
2016-04-14 16:22
edited on: 2016-04-14 16:37

Wrt note 3155

    Do you like to make the Bourne Shell, ksh88 and ksh93 invalid just to
    declare some badly written scripts conforming?

I do not think that anyone is suggesting that. Keep the scripts that
make invalid assumptions non-conforming. That's fine. What is being
objected to is making a shell in which such a non-conforming script acts
differently from what it does (or did) in ksh88 from being non-conforming,
just because it implements unspecified behaviour in a different way.

After all, that is what "unspecified" is for, is it not?

That is, given the following

A file dotscript that contains

     alias l=ls
     l directory

(which is non-conforming, if used via the '.' command as it uses an alias
defined in the script)

and a user (or script) who has done

     alias l='rm -fr'
     . dotscript

(assuming the dotscript above is correctly found via $PATH) then is the
shell required to remove "directory" and all its contents, or is it
permitted to list it?

We know the script is non-conforming, and the results are supposed to be
unspecified, but the changes appear to say that the "alias" inside the
script cannot be made visible in the script, hence the preceding existing
alias must be used. This does not look very unspecified to me.

Change the wording (again) - simply make it unspecified whether aliases
defined in a . script (or eval, etc - I assume traps are the same) will
be visible inside the same dotfile (or arg to eval, etc) - and do not
specify how the shell is required to parse such things. That way, all
those old shells which didn't care that it is impossible to explain to
a user why "sh file" works one way and ". file" is completely different,
even though the commands are exactly the same, and no-one is questioning
anything related to changes to the current shell environment (those differences
are easy to explain, and are why there are two ways to execute commands in
a file.)

Lastly, it is too late now to go back and specify aspects of ksh88 (and
whatever other shells were around at the time) and just mark those down
as "we forgot to say this ..." as if the omission is irrelevant, and should
obviously be corrected. The standard as published now (ignoring all
currently agreed updates for the next revision) specifies what is required
of the shell (and other thins, for other parts of the standard), and
implementations are perfectly entitled to assume they can implement it
as specified, without needing to guess whether parts that are unspecified
were unspecified deliberately or by omission.

To add new requirements now, there must be a general agreement from all of
today's shells, that the new specification is correct (and if they deviate
it is just a bug) - anywhere there is disagreement about how things should be
done should be left explicitly unspecified. Perhaps at some future time
some of the unspecified items can be specified, if agreement has been reached
about what is the best, and everyone is implementing that - but not before
that.

(0003157)
chet_ramey (reporter)
2016-04-14 19:39

With respect to note 3156: the explanation is worse than you think. Imagine explaining to a user why `. file' behaves differently than taking the commands in `file' and inserting them directly into a script or pasting them into an interactive shell.

We're concentrating on aliases here, but as note 3148 shows, the consequences extend beyond aliases. Comment 3150's speculation about "the intent" aside, the proposed changes do require changes to the behavior when handling syntax errors encountered in a file read with `.'.

With respect to note 3155: the standard concerns itself with the behavior of utilities, including the shell. If the standard leaves some aspect of that behavior undefined or un(der)specified, as in this case, utilities are free to implement it as they choose. The proposed wording certainly changes the required behavior, and should require consensus, as kre said in note 3156.

I understand your loyalty to the original Bourne shell "line", as it were, but it doesn't seem productive or useful to refer to other shells as "divergent clone implementations." You're not going to build consensus that way.

If you think the bash behavior of adding a space is a bug, and should not be left unspecified, feel free to report it as a bug with examples to reproduce it, and I will look at fixing it. Since I've never received a bug report complaining about the current behavior, it's not going to be changed for any other reason.
(0003158)
shware_systems (reporter)
2016-04-14 20:33

wrt to adding a space, someone said they would file a bug back in Feb., I thought. Guess it wasn't followed up on.

Trivial example:
alias foo "echo \'Error:"
#note no trailing <SPC> or single quote
foo bar'; #trailing quote here, argument token starts with "b" in bar' before alias expansion.

After expansion,
bash output:
Error:<SPC><SPC>bar

other shells:
Error:<SPC>bar

Iow, bar pushed out an extra space, producing output that aligns differently with following lines using echo directly, as argument token now begins in the alias and the original delimiting space becomes a continuation of that and significant to the output.
(0003159)
eblake (manager)
2016-04-14 22:15

Fortunately, the trivial example in Note: 0003158 is explicitly marked as unspecified behavior in the proposed resolution of Note: 0003113 (the value of the alias foo "cannot be tokenized as a simple command") - so while the behavior may be different between shells, POSIX is stating that end users should not be using such an alias in the first place within a portable script.
(0003160)
rhansen (manager)
2016-04-14 23:12

> The new restriction that an alias must be tokenizable as a simple
> command without ASSIGNMENT_WORDs or redirection

Note that this restriction is on the application, not the implementation. Applications that define aliases that are not simple commands venture into unspecified behavior, and implementations are permitted to react however they want to such aliases.

> suggests an implementation that does not expand aliases in the
> parser but just before executing a simple command. Is it deliberate
> that this is permitted?

Issue 7+TC1 does not clearly specify when implementations expand aliases, and the TC1 description of alias substitution is unclear enough that nothing can be interpreted as requiring expansion during parsing, so I would say that expansion just before execution is already permitted even without this change.

> The new restriction also allows removing the restriction on finding
> the end of $(...) command substitution "not including the alias
> substitutions in Section 2.3.1," from XCU 2.2.3 Double-Quotes.

I believe that you are correct—I think those words can be deleted.

> Since an alias must be tokenizable as a simple command,

Not quite correct—a strictly conforming application should only use simple commands in aliases, but implementations are permitted to support aliases that are not simple commands as an extension. Applications can take advantage of such extensions, though they would no longer be strictly conforming.

> it does not matter whether aliases are expanded or not for finding
> the matching closing parenthesis.

Correct, if the alias is a simple command. With the changes in Note: 0003113, aliases that are not simple commands trigger unspecified behavior, and the standard doesn't describe what happens during unspecified behavior, so we should delete those words.

> Note that ash variants such as FreeBSD sh, NetBSD sh and dash fully
> parse a command substitution when it is encountered during the outer
> parse.

I believe the new wording still permits that behavior.

> I cannot find a definition for 'program' in XCU 2.10 Shell Grammar.

See 0000736.

> From the context I derive that "parse the tokens as a program and
> execute the resulting commands" means "parse complete_commands and
> execute them as they are parsed, one complete_command at a
> time".

While that is how implementations behave, I don't think that's what the standard says. I believe the standard allows an implementation to read and parse absolutely everything first then execute after all parsing is done (except maybe interactive shells), even when it would be foolish to do so.

> Apart from alias expansion, this affects consequences of syntax
> errors and the output when 'set -v' is enabled.

It affects when syntax errors and 'set -v' have an effect.

If an implementation reads and parses its entire input before starting execution, a syntax error late in the input might trigger the "Consequences of Shell Errors" behaviors before anything has executed. Also, a 'set -v' would never have an effect. (That is almost certainly a flaw in the description of 'set -v'. Would someone remind me to file a new bug report to change the description of 'set -v' to be based on execution time, not input read time?)

> The changes to dot and eval require the shell to parse the entire
> input before executing anything.

It was not our intention to require complete parsing before execution begins, but I can see how the new wording can be read that way.

Note that XCU 2.1 similarly implies that the entire input is read and parsed before execution begins.

The intention was to simply specify how the shell parses the contents of dot scripts, eval strings, and command substitution bodies. For the dot command, maybe something like this would be better:
The shell shall execute commands from the file in the current environment. The contents of file shall be tokenized (see [xref to XCU 2.3 Token Recognition]) and parsed as a compound_list (see [xref to XCU 2.10 Shell Grammar]).
Similar wording can be used for eval and command substitutions.

> Although the Bourne shell, the real Korn shell and zsh do this, most
> other shells parse complete_commands and execute them as they are
> parsed, one complete_command at a time (FreeBSD sh, mksh, bash in
> both POSIX and default mode, yash). I tested this using:
>
> sh -c "eval 'echo The next line has an error.
> )'"
>
> and
>
> sh -c '. ./witherror.sh'
>
> where witherror.sh contains
>
> #!/bin/sh
> echo The next line has an error.
> )
>
> To clarify this, the descriptions of dot and eval could explicitly
> permit parsing both one complete_command at a time and the entire
> input (compound_list) at once. The new paragraph about deferring
> changes to aliases can then be replaced by text in XCU 2.1 about
> parsing complete_commands at a time and the entire input
> (compound_list) at once.

I don't want to add those restrictions. I want conforming implementations to continue to be able to read and parse an entire program (all complete_commands) before anything is executed, if that's what they want to do. Similarly, I want conforming implementations to continue to be able to read, parse, and execute the terms in a compound_list one at a time, if that's what they want to do.

> The line "%start complete_command" could have a comment that "%start
> compound_list" may be used instead in some cases (command
> substitution, dot, eval, trap).
>
> In practice, a situation like the above example may occur if a
> sourced script first tests for an extension that a standard shell
> cannot parse and subsequently uses it. With complete_command at a
> time parsing, a complete_command [ -z "$BASH_VERSION" ] && return
> will ensure the rest of the file is not parsed by shells other than
> bash. Therefore, it is a bad idea to change an existing
> implementation from parsing a complete_command at a time to the
> entire input at once.

That is a compelling case for requiring implementations to parse and execute one complete_command at a time. A similar example is a self-extracting tarball. However, assuming I am correct in my understanding that the standard doesn't already require one-at-a-time parsing and execution, I'd prefer to address this in another bug report.
(0003161)
kre (reporter)
2016-04-15 09:35

Re note 3160...

Given that intent, I'd suggest discarding almost all of the proposed
wording changes, the ones that relate to when the alias becomes visible
(the ones that relate to what are unspecified usages are fine)
and instead just say in the section in the alias command ...

   When the assigned alias becomes available for use is unspecified,
   though the shell shall ensure it is available no later than the
   time it would next evaluate and write PS1 (if that variable were set.)

And yes, that makes defining and using an alias in scripts non-conforming.
Given the apparent lack of any spec of when aliases apply, that would be the
case now, right?

Personally I have no problem with that at all - I have always
regarded aliases as a user experience facility, not one that should be
exploited elsewhere (in a script, just write out the value of what could
be an alias and avoid all problems - use your editor to assist if needed.)

This would seem to fix the original problem without creating lots of new
ones, wouldn't it?

If you don't currently believe that all use of aliases, in any scripts
(including $ENV) is non-conforming now, then the only rational way that the
existing standard could be read is that aliases become visible immediately
after the alias command is read, and in
     if alias t=test; t $# -eq 0
     then ....
the "t" must work as "test" whether this is in a script run as "sh script",
a script run as ". script", or simply typed on the command line.

And I don't care in the slightest if that makes ksh88 (etc) be non-conforming.

Lastly, while here, I cannot comprehend the distinction that is made in the
current proposed wording between the way that $ENV is read when initially
processed by the shell, and if later re-invoked by typing ". $ENV" (which
I do a lot when I make changes to that file). If there is any permitted
way that the exact same results are not achieved in both cases (whatever
those results are) then the standard is simply insane (and so is any
implementation that acts like that.)
(0003162)
kre (reporter)
2016-04-15 10:06

Also re note 3160, but this is a different issue (which probably is not
really all that related to the topic in question) ...

    I want conforming implementations to continue to be able to read and parse
    an entire program (all complete_commands) before anything is executed,
    if that's what they want to do.

I would actually prefer to make that explicitly outlawed. A shell that
operates that way is useless as an interactive environment (somewhere else
you said "except interactive" or similar - but I could see no justification
in anything that currently exists for that distinction, it appears to
have simply been made up on the fly because without that, the wording would
have made the shell useless.)

This (seems to me) to largely come down to the differences between those
who believe the shell is largely a programming language, and should be
defined as such (and that people happen to use it interactively is just
a nuisance) and those who believe the shell is primarily an interactive
environment, which happens to have something that is similar to a
programming language as its command set (because that is often useful).

I am very much in the latter camp - I don't care in the slightest about
the purity of the shell programming environment as a language - there are
plenty of other programming languages available to people who want one
of those. On the other hand, if the objective here is really to define
a new programming language, and ignore the requirements of the interactive
users, then we really need to start on a new project to standardise the
interactive posix environment, as without that posix is definitely lacking
an important ingredient. Perhaps we could take csh as the model (the way
ksh88 was for the shell chapter of the standard) and standardise that?
(As a programming language, csh is disgusting, as an interactive environment,
especially if updated a little from how it was 30 years ago, it was fine.)

Note: I do agree that it is (often) possible to meet both objectives,
and that's fine - but every now and again (as with this "parse the complete
program before executing anything" strategy) something comes up where which
of these is the objective really matters. If the intent is to make a
good interactive environment, then "complete parsing" is utter nonsense.
If the intent is to make a great programming language, then it is a very
good idea. One or the other must lose.
(0003163)
shware_systems (reporter)
2016-04-15 13:49

Re: 3159
No, because then the "ls -l" usage, where the alias includes command with default argument but additional arguments may follow in the continuation text, is precluded. With the changes the example still parses as simple command name and first argument also, which with the text continuation is properly quoted and not a syntax error, parsing the line left-to-right. So it is valid before and after.

The unspecified behavior example is, I believe:
alias foo "ec"
foo ho "text"

some implementations may treat as
echo "text"

and some may treat it as
ec 'ho' "text"

and the original wording wasn't explicit enough for a conformance distinction, related to when delimiting blank merging and removal might occur; as part of removing the alias name before or they're merged after the substitution is inserted, during grammatical evaluation. The bug example is consistent with this second view.

For the first type, the expected bug example output would be:
Error:<SPC>bar and Error:bar respectively. This may even be more accurate. I felt having both types would obfuscate the point everyone on the Feb. calls agreed the output change induced, where total length of the argument token differed for the same merging behavior, was non-conforming and therefore a bug.

For most alias usage, merging causes a blank addition to be superfluous so it will rarely cause an observable behavior change, and can be considered a benign extension, but rarely and never are not the same so it is an extension. It just means it may take longer for someone to notice it's an issue, years in this case.
(0003164)
chet_ramey (reporter)
2016-04-15 19:09
edited on: 2016-04-15 19:10

Re: 3160

To be consistent and permit existing behavior, we need to remove references to
how files read with `.' (and $ENV processing) and commands executed with `eval'
are parsed. That would address your concerns.

I'm confused about why would you want to change the description of `set -v'.
Its purpose is to echo lines when the shell reads them, whenever that is. It
actually provides good insight into how the behaviors of bash and ksh93 differ
when processing files with `.'. What would you mean by changing it to
`execution time'?

(0003165)
rhansen (manager)
2016-04-17 23:49
edited on: 2016-04-17 23:51

Re Note: 0003161:

> Re note 3160...
>
> Given that intent, I'd suggest discarding almost all of the proposed
> wording changes, the ones that relate to when the alias becomes
> visible (the ones that relate to what are unspecified usages are
> fine) and instead just say in the section in the alias command ...
>
> When the assigned alias becomes available for use is unspecified,
> though the shell shall ensure it is available no later than the
> time it would next evaluate and write PS1 (if that variable were
> set.)

The goal of the wording in Note: 0003113 is to clarify but not change the (perhaps unstated/understated) requirements on implementations and script authors. Prohibiting aliases in non-interactive shells—or even making their behavior unspecified—would be a requirements change that would have to be handled in a separate bug report (one that targets Issue 8, not a TC for Issue 7).

I'm not opposed to your suggestion; in fact I would prefer to go even further and eliminate aliases from the standard altogether (i.e., any use of the alias command results in unspecified behavior). I'm just opposed to making that change in this particular bug report.

> If you don't currently believe that all use of aliases, in any
> scripts (including $ENV) is non-conforming now, then the only
> rational way that the existing standard could be read is that
> aliases become visible immediately after the alias command is read,
> and in
> if alias t=test; t $# -eq 0
> then ....
> the "t" must work as "test" whether this is in a script run as "sh
> script", a script run as ". script", or simply typed on the command
> line.

Yes, that is what the standard says now. We believe that to be a flaw in the current standard, and that the intention all along was to permit deferred application of alias changes. The new wording attempts to correct that flaw.

> Lastly, while here, I cannot comprehend the distinction that is made
> in the current proposed wording between the way that $ENV is read
> when initially processed by the shell, and if later re-invoked by
> typing ". $ENV" (which I do a lot when I make changes to that file).
> If there is any permitted way that the exact same results are not
> achieved in both cases (whatever those results are) then the
> standard is simply insane (and so is any implementation that acts
> like that.)

The wording was chosen to accommodate existing implementations. Yes, existing implementations treat ENV processing during shell invocation differently from . $ENV. It's weird and unfortunate, but that's the way it is. Because this bug report is targeting an Issue 7 TC, it's not appropriate to clean up that particular mess here because it would cause implementations that currently conform to Issue 7 to no longer conform.

I would love to see this ENV glitch cleaned up in Issue 8, but we must use a separate bug report to make Issue 8 changes. I'll file a new bug report after this bug is resolved (assuming I remember).

Re Note: 0003162:

> Also re note 3160, but this is a different issue (which probably is
> not really all that related to the topic in question) ...
>
> I want conforming implementations to continue to be able to read
> and parse an entire program (all complete_commands) before
> anything is executed, if that's what they want to do.
>
> I would actually prefer to make that explicitly outlawed.

I agree. (See the end of Note: 0003160.) Such a change would have to be made in Issue 8, so a separate bug report must be filed. Would you mind doing that?

To clarify my previous statements:
  • I do not know whether POSIX specifies when execution starts relative to parsing.
  • I believe, but am not 100% sure, that it is silent on the topic.
  • I have never called for standardizing parse-everything-before-executing-anything, nor do I think it would be wise to do so.
All I was trying to say is that any proposed wording that adds or assumes a requirement about when execution starts relative to parsing is not appropriate for this particular bug report (out of scope).

Re Note: 0003164:

> To be consistent and permit existing behavior, we need to remove
> references to how files read with `.' (and $ENV processing) and
> commands executed with `eval' are parsed. That would address your
> concerns.

How does the new wording prohibit existing behavior? We tried to word it to permit all existing behavior. (The new wording can be read as requiring implementations to parse all input before executing anything, which was unintentional and will be fixed. Is that what you're referring to?)

(Regarding consistency, see my reply to kre a few paragraphs up.)

> I'm confused about why would you want to change the description of
> `set -v'.

Apologies, I confused 'set -x' and 'set -v'.

(0003166)
chet_ramey (reporter)
2016-04-18 12:47

Re 3165: If the intent of this change is to "permit deferred application of
alias changes", then the proposed text should mention aliases explicitly
instead of specifying a particular parsing mechanism, especially one
(compound_list) that requires the input to be fully parsed before being
executed. Something like (for `.'):

"It is unspecified whether aliases defined in file are available for use before
all commands in file have been executed."

The existing wording says nothing about how the shell implementation sequences
tokenization, parsing, and execution, so I find it hard to reconcile that, your
statement that "any proposed wording that adds or assumes a requirement about
when execution starts...is not appropriate," and the (possibly unintentional?)
new requirement that implementations parse input fully before executing it.

If the new requirement is unintentional, it will need to be fixed. You can
either make the behavior explicitly unspecified, which would be ok, or remove
the references to how the shell parses `.' and `eval', and processes $ENV, as I
(and others) suggested, making it implicitly unspecified. That would permit all
known existing behavior.
(0003167)
rhansen (manager)
2016-04-18 21:21
edited on: 2016-04-18 21:21

Re Note: 0003166:

> Re 3165: If the intent of this change is to "permit deferred
> application of alias changes", then the proposed text should mention
> aliases explicitly instead of specifying a particular parsing
> mechanism,

TC1 doesn't specify how dot scripts, eval arguments, command substitution bodies, or ENV scripts are parsed. It doesn't even say that they should be tokenized. This is an oversight in the standard (unrelated to aliases) and should be addressed even if aliases didn't exist.

Perhaps this oversight should be addressed in a separate bug report. However, addressing it here is convenient because of a fortunate coincidence that we can exploit to our advantage: Both program (added by 0000736) and compound_list are grammatically identical. In other words, any sequence of tokens that would be accepted as a program would also be accepted as a compound_list and vice-versa.

We can fix the alias specification by introducing the following distinction between the two otherwise identical grammar symbols: An alias change inside a compound_list might not take effect until some time after the compound_list finishes execution, while an alias change inside a program is guaranteed to take effect at the start of the next complete_command.

To get a specified behavior that matches existing practice, all that remains is to pick program or compound_list as the starting symbol for each of: dot scripts, eval arguments, command substitution bodies, and ENV scripts. The wording in Note: 0003113 chose compound_list for dot scripts, eval arguments, and command substitution bodies but chose program for ENV scripts. These choices were made to match the behavior of existing implementations.

> especially one (compound_list) that requires the input to be fully
> parsed before being executed. Something like (for `.'):
>
> "It is unspecified whether aliases defined in file are available for
> use before all commands in file have been executed."
>
> The existing wording says nothing about how the shell implementation
> sequences tokenization, parsing, and execution, so I find it hard to
> reconcile that, your statement that "any proposed wording that adds
> or assumes a requirement about when execution starts...is not
> appropriate," and the (possibly unintentional?) new requirement
> that implementations parse input fully before executing it.
>
> If the new requirement is unintentional, it will need to be
> fixed. You can either make the behavior explicitly unspecified,
> which would be ok, or remove the references to how the shell parses
> `.' and `eval', and processes $ENV, as I (and others) suggested,
> making it implicitly unspecified. That would permit all known
> existing behavior.

Yes, that new requirement is unintentional and will be fixed. (I'll reopen this bug to make it abundantly clear that we all agree that Note: 0003113 is flawed.)

For dot scripts, eval arguments, command substitution bodies, and ENV scripts, all we intented to specify was which grammar symbol to use as the start symbol.

(0003178)
rhansen (manager)
2016-04-28 15:18

We discussed deprecating aliases altogether during last week's telecon. For a hypothetical Issue 7 TC3 we can't simply say that any use of the alias and unalias commands results in unspecified behavior because that's too much of a change for a TC. The best we can do is mark those commands as deprecated in Issue 8, which means we still need to say something for Issue 7 to resolve this issue.

During last week's teleconference Eric Blake posted an interesting example in bash:
$ printf 'alias foo="echo 2"\nfoo' > dot
$ alias foo="echo 1"
$ . ./dot; foo
2
1
$ foo
2
Because of this behavior, I don't see a way to fix and clarify the behavior of aliases in a manner compatible with existing practice without also specifying the relationship between aliases and parsing as well as the timing relationship between parsing and execution. I was hoping to avoid the topic of parsing because of the extra work involved and because it increases the risk of unintended consequences. I took an action item to come up with some wording off-line that we can discuss in a future telecon.

I think I will file a separate bug report for specifying the timing relationship between parsing and execution because that also affects when syntax errors are handled. Then I can return to this bug report to specify the relationship between aliases and parsing.
(0003195)
stephane (reporter)
2016-05-02 21:30

Sorry. Coming late to the discussion. A few comments on the accepted text which I think (I hope) have not been made yet (inline):

> Notes to the Editor (not part of this interpretation):
> -------------------------------------------------------
> On page 2322 lines 73691-73702 (XCU 2.3.1 Alias Substitution), change:
> 
> After a token has been delimited, but before applying the grammatical
> rules in Section 2.10, a resulting word that is identified to be the
> command name word of a simple command shall be examined to determine
> whether it is an unquoted, valid alias name. However, reserved words in
> correct grammatical context shall not be candidates for alias
> substitution. A valid alias name (see XBD Section 3.10) shall be one
> that has been defined by the alias utility and not subsequently
> undefined using unalias. Implementations also may provide predefined
> valid aliases that are in effect when the shell is invoked. To prevent
> infinite loops in recursive aliasing, if the shell is not currently
> processing an alias of the same name, the word shall be replaced by the
> value of the alias; otherwise, it shall not be replaced.
> 
> If the value of the alias replacing the word ends in a <blank>, the
> shell shall check the next command word for alias substitution; this
> process shall continue until a word is found that is not a valid alias
> or an alias value does not end in a <blank>.
> 
> to:
> 
> After a token has been delimited and has been identified to be the
> command name word of a simple command by applying the grammatical rules
> in Section 2.10, the word shall be subject to alias substitution if:
> 
>         the word does not contain any quoting characters,
>         the word is a valid alias name (see XBD Section 3.10),
>         an alias with that name is in effect, and
> 	the word is not recognized as a reserved word (see [xref to 2.4
> 	Reserved Words] and the examples in [xref to XRAT C.2.3.1]).

IMO, those restriction should be put *on the application* at the time of
the alias definition.  As in:

   In

      alias x=y

   the behaviour is unspecified unless x is a "name" (with - and _
   or more allowed if you want) and is not a reserved word

There's no value in preventing shell implementations to allow aliasing
of reserved words or arbitrary words. Some do.

Aliasing keywords is useful to instrument them like

   alias if='if context=if;'
   PS4='($context)+ '

See also

   alias time='"time" '

to disable the "time" keyword in bash
http://unix.stackexchange.com/questions/148484/how-to-disable-a-shell-keyword/148492#148492 [^]

All we need is tell applications what are the portable ones. Once ", '
\... are not allowed in alias names, the text above can be simplified to
"the word before quote removal is the name of a defined alias".


[...]
> If the value of the alias cannot be tokenized as a simple command (see
> Section 2.9.1) according to the shell grammar rules (see Section 2.10),
> or if the alias can be tokenized as a simple command but contains an
> ASSIGNMENT_WORD token or redirection, the behavior is unspecified.

One of the main interests of having aliases is to be able to do
modifications to the language like:

    https://github.com/modernish/modernish/blob/master/libexec/modernish/loop/cfor.mm [^]
    alias cfor='initialise && while cloop'
    cfor 'i=0' 'i<10' i++; do
      echo "$i"
    done

Or:

    if [ "$doit" ]; then
      alias doit=
    else
      alias doit='#'
    fi
    doit echo test | tr e a

(That "modernish" comes with a few nice usages of aliases. Its author is
a contributor here btw).

If you're limited to simple commands, then there's little things left
aliases can do that functions can't. (maybe things like alias
fail="return 1", or alias push='set -- "$@"' which wouldn't work with
functions). So you might as well deprecate them.

Also note that ksh93 has a builtin alias that is not a simple command:

times='{ { time;} 2>&1;}'


> When
> a word is subject to alias substitution, the value of the alias shall be
> tokenized as a simple command and the tokens shall replace the word. To
> prevent infinite loops in recursive aliasing, if the shell is currently
> processing an alias of the same name, the word shall not be replaced.

In pratice, that doesn't work for

alias uname='eval uname'

or

alias uname='echo "$(uname)"' # that one OK with dash though

For which most shells fail or crash upon expansion.

> If the value of the alias replacing the word ends in a <blank> that
> would be unquoted after substitution, the shell shall check the next
> command word for alias substitution; this process shall continue until a
> word is found that is not a valid alias or an alias value does not end
> in a <blank>.

Except with yash, space and tab are the only "blanks" for which the
above seems to work.

In practice that also happens for alias values that end in ";", newline,
&, |, ||, &&, that is where the next work is in command position.


Now another thing for consideration and that could be addressed at the
same time:

Aliases are one thing affecting the parsing of the language dynamically,
but locales are another one. And there are similar considerations.

For instance,

  LC_ALL=fr_FR.UTF-8
  stéphane=1
  echo $'\u00e9' # as proposed but not yet in the language

In some shells only works correctly if that compound_list is not parsed
as a whole.

Same goes for the <blanks> that act as token separators according to
locale. (see for instance how a UTF-8 à is c3 a0 and a0 happens to be a
blank in Solaris latin1 locales for instance which means that "echo
voilà123" is two words in a UTF-8 locale, but 3 in a latin1 Solaris
locale).

I'd be tempted to treat the behaviour of shells that parse the code in .
sourced-file or $ENV or eval code or sh -c code or $(code) as a whole as
a bug and require shells parse them the same regardless of where the
code comes from. That is that alias definitions and changes to locale
settings take effect no later than the next complete_command in every
case.

(0003196)
joerg (reporter)
2016-05-03 09:30

@ Note: 0003195

An alias cannot replace a shell keyword "at the right place". There is
even a related example in the POSIX standard.

Shells that really try to follow the standard seem to implement
the word parser the same way: they first check whether a text
can be seen as a keyword and only if the text is just a simple
word, they apply alias expansion.

Check C.2.3.1 Alias substitution.

This explains what people expect when you e.g. define an alias called
"while".

BTW: bash has a problem with it's "time" keyword because it does not
implement it in a POSIX compliant way.

POSIX requires that

  time -p ls

produces a POSIX compliant "time output", but this does not happen
when you are using bash.

ksh88 is broken as well, but ksh93 and e.g. the current Bourne Shell
disable "time" as a keyword in case that an option directly follows
"time". This causes

  time -p ls

to execute the system's "time" implementation (e.g. from /usr/bin)
assuming that the system is POSIX compliant and delivers a
POSIX compliant "time" implementation.
(0003197)
stephane (reporter)
2016-05-03 11:08

Re: 3196

The current spec requires implementations to implement the arbitrary limitation of ksh (wrt keywords as aliases) which is not helpful. keywords with the exception of "in", "do" and "esac" are found in command position the same place aliases are recognised, there's nothing technically preventing shells to allow aliasing those keywords. Some implementations (bash, zsh) have gone all the way to change their behaviour so as to implement that limitation of ksh when invoked as "sh" which is a shame and a waste of effort IMO.

I'd rather the spec tell application writers that they mustn't define aliases for keywords rather than allowing applications to define aliases for keywords and forcing sh implementations to treat them in a useless way.

At the moment, the spec says:

Please do "alias while=uname" if you want and be guaranteed that it will be ignored when you try to use it, except when that alias appears after another alias whose definition ends in a blank, as in:

    alias while=foo
    alias 'echo=echo '
    echo while

(which btw doesn't work for zsh when invoked as sh)

That is a useless requirement. I'd rather the spec says:

Please don't define aliases for keywords (as that won't work as you expect in some shells).
(0003198)
stephane (reporter)
2016-05-04 06:38

By the current wording of the spec, one could derive that aliases should be expanded in all of these cases below as the keywords are not "in correct grammatical contexts":

1 alias in=uname
   in

2 alias fi=uname
   fi

3 alias while=uname
   <&0 while

4 alias while=uname
   a=1 while

5 alias while=uname
   while;

3 and 4 are actually expanded in most shells (exception zsh in sh emulation), others are not except for bash and zsh when not in sh emulation.
(0003555)
ajosey (manager)
2017-01-18 16:08

Reset the interp status to pending. This item is not resolved
(0004201)
geoffclare (manager)
2019-01-09 12:29
edited on: 2019-01-15 10:44

This is a proposed new resolution which addresses comments made since Note: 0003113 both here and on the mailing list. There have been a lot of comments, so if I missed anything please reply on the mailing list and (if I agree) I will edit this note.

All page and line numbers are for the 2016 and 2018 editions.

On page 2348 line 74794-74805 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name. However, reserved words in correct grammatical context shall not be candidates for alias substitution. A valid alias name (see XBD Section 3.10) shall be one that has been defined by the alias utility and not subsequently undefined using unalias. Implementations also may provide predefined valid aliases that are in effect when the shell is invoked. To prevent infinite loops in recursive aliasing, if the shell is not currently processing an alias of the same name, the word shall be replaced by the value of the alias; otherwise, it shall not be replaced.

If the value of the alias replacing the word ends in a <blank>, the shell shall check the next command word for alias substitution; this process shall continue until a word is found that is not a valid alias or an alias value does not end in a <blank>.
to:
After a token has been categorized as type TOKEN (see [xref to 2.10.1]), including (recursively) any token resulting from an alias substitution, the TOKEN shall be subject to alias substitution if:
  • the TOKEN does not contain any quoting characters,
  • the TOKEN is a valid alias name (see XBD Section 3.10),
  • an alias with that name is in effect,
  • the TOKEN did not result from an alias substitition of the same alias name at any earlier recursion level, and
  • the TOKEN will be parsed as the command name word of a simple command when the grammatical rules in Section 2.10 are applied,
except that if the TOKEN meets the above conditions and is recognized as a reserved word (see [xref to 2.4 Reserved Words]), it is unspecified whether the TOKEN is subject to alias substitution.

An implementation may defer the effect of a change to an alias but the change shall take effect no later than the completion of the currently executing complete_command (see [xref to XCU 2.10 Shell Grammar]). Changes to aliases shall not take effect out of order. Implementations may provide predefined aliases that are in effect when the shell is invoked.

If the value of the alias is not a simple command (see [xref to 2.9.1]), or contains any of:
  • a comment
  • a variable assignment
  • a redirection
  • unbalanced single-quotes or double-quotes
(except within a command substitution), the behavior is unspecified.

When a TOKEN is subject to alias substitution, the value of the alias shall be processed to form tokens (see [xref to 2.3]) and the resulting tokens shall replace the TOKEN. If the value of the alias replacing the TOKEN ends in a <blank> that would be unquoted after substitution, and optionally if it ends in a <blank> that would be quoted after substitution, the shell shall check the next token in the input, if it is a TOKEN, for alias substitution; this process shall continue until a TOKEN is found that is not a valid alias or an alias value does not end in such a <blank>.

On page 2351 line 74901-74904 (XCU 2.5.3 Shell Variables) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the shell shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of the file, parse the tokens as a program (see [xref to XCU 2.10 Shell Grammar]), and execute the resulting commands in the current environment. (In other words, the contents of the ENV file are not parsed as a single compound_list. This distinction matters because it influences when aliases take effect.)


On page 2358 line 75202-75204 (XCU 2.6.3 Command Substitution), change:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
to:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command.

With both the backquoted and $(command) forms, command shall be tokenized (see [xref to XCU 2.3 Token Recognition]) and parsed (see [xref to XCU 2.10 Shell Grammar]). It is unspecified whether command is parsed and executed as a program (as for a shell script) or is parsed as a single compound_list that is executed after the entire command has been parsed. With the $(command) form any valid program can be used for command, except a program consisting solely of redirections which produces unspecified results.

On page 2393 line 76554 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The shell shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of the file, parse the tokens (see [xref to XCU 2.10 Shell Grammar]), and execute the resulting commands in the current environment. It is unspecified whether the commands are parsed and executed as a program (as for a shell script) or are parsed as a single compound_list that is executed after the entire file has been parsed.

On page 2394 line 76620 (XCU 2.14 eval), change:
The eval utility shall construct a command by concatenating arguments together, separating each with a <space> character. The constructed command shall be read and executed by the shell.
to:
The eval utility shall construct a command string by concatenating arguments together, separating each with a <space> character. The constructed command string shall be tokenized (see [xref to XCU 2.3 Token Recognition]), parsed (see [xref to XCU 2.10 Shell Grammar]), and executed by the shell in the current environment. It is unspecified whether the commands are parsed and executed as a program (as for a shell script) or are parsed as a single compound_list that is executed after the entire constructed command string has been parsed.

On page 2459 line 78855 (XCU alias), change:
An alias definition provides a string value that shall replace a command name when it is encountered; see [xref to 2.3.1].
to:
An alias definition provides a string value that shall replace a command name when it is encountered. For information on valid string values, and the processing involved, see [xref to 2.3.1].

On page 3721 after line 127547 (XRAT C.2.3.1), insert:
Implementations differ in how alias substitution is performed when the alias value does not have the form of a simple command. For example, given:
$ alias foo='some_command &'
$ foo&
some, but not all, implementations retokenize the two '&' characters into an && (and) operator.

Some, but not all, shell implementations do not process changes to alias definitions until the current <tt>compound_list</tt> (see [xref to XCU 2.10 shell grammar]) has completed. In these shells, alias changes do not take effect until the end of the dot script, eval command, function invocation, if statement, case statement, for statement, while statement, or until statement containing the alias change.

Many shell implementations execute the contents of a file, typically <tt>~/.profile</tt>, when invoked as a login shell. The standard developers are unaware of any such implementations that process the contents of <tt>~/.profile</tt> (and similar startup files) as a single <tt>compound_list</tt>, so alias changes in <tt>~/.profile</tt> typically do take effect before the end of <tt>~/.profile</tt>.


(0004214)
geoffclare (manager)
2019-01-18 11:48
edited on: 2019-02-07 16:33

Interpretation response
------------------------
The standard does not speak to this issue, and as such no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor.

Rationale:
-------------
There is widespread existing practice to follow, and the standard developers have attempted to codify this in the changes below.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

All page and line numbers are for the 2016 and 2018 editions.

On page 2348 line 74794-74805 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name. However, reserved words in correct grammatical context shall not be candidates for alias substitution. A valid alias name (see XBD Section 3.10) shall be one that has been defined by the alias utility and not subsequently undefined using unalias. Implementations also may provide predefined valid aliases that are in effect when the shell is invoked. To prevent infinite loops in recursive aliasing, if the shell is not currently processing an alias of the same name, the word shall be replaced by the value of the alias; otherwise, it shall not be replaced.

If the value of the alias replacing the word ends in a <blank>, the shell shall check the next command word for alias substitution; this process shall continue until a word is found that is not a valid alias or an alias value does not end in a <blank>.
to:
After a token has been categorized as type TOKEN (see [xref to 2.10.1]), including (recursively) any token resulting from an alias substitution, the TOKEN shall be subject to alias substitution if:
  • the TOKEN does not contain any quoting characters,
  • the TOKEN is a valid alias name (see XBD Section 3.10),
  • an alias with that name is in effect,
  • the TOKEN did not either fully or, optionally, partially result from an alias substitution of the same alias name at any earlier recursion level, and
  • the TOKEN could be parsed as the command name word of a simple command (see [xref to Section 2.10]), based on this TOKEN and the tokens (if any) that preceded it, but ignoring whether any subsequent characters would allow that,
except that if the TOKEN meets the above conditions and would be recognized as a reserved word (see [xref to 2.4 Reserved Words]) if it occurred in an appropriate place in the input, it is unspecified whether the TOKEN is subject to alias substitution.

When a TOKEN is subject to alias substitution, the value of the alias shall be processed as if it had been read from the input instead of the TOKEN, with token recognition (see [xref to 2.3 Token Recognition]) resuming at the start of the alias value. When the end of the alias value is reached, the shell may behave as if an additional <space> character had been read from the input after the TOKEN that was replaced. If it does not add this <space>, it is unspecified whether the current token is delimited before token recognition is applied to the character (if any) that followed the TOKEN in the input.

Note: a future version of this standard may disallow adding this <space>.

If the value of the alias replacing the TOKEN ends in a <blank> that would be unquoted after substitution, and optionally if it ends in a <blank> that would be quoted after substitution, the shell shall check the next token in the input, if it is a TOKEN, for alias substitution; this process shall continue until a TOKEN is found that is not a valid alias or an alias value does not end in such a <blank>.

An implementation may defer the effect of a change to an alias but the change shall take effect no later than the completion of the currently executing complete_command (see [xref to XCU 2.10 Shell Grammar]). Changes to aliases shall not take effect out of order. Implementations may provide predefined aliases that are in effect when the shell is invoked.

On page 2351 line 74901-74904 (XCU 2.5.3 Shell Variables) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the shell shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of the file, parse the tokens as a program (see [xref to XCU 2.10 Shell Grammar]), and execute the resulting commands in the current environment. (In other words, the contents of the ENV file are not parsed as a single compound_list. This distinction matters because it influences when aliases take effect.)


On page 2358 line 75202-75204 (XCU 2.6.3 Command Substitution), change:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
to:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command.

With both the backquoted and $(command) forms, command shall be tokenized (see [xref to XCU 2.3 Token Recognition]) and parsed (see [xref to XCU 2.10 Shell Grammar]). It is unspecified whether command is parsed and executed as a program (as for a shell script) or is parsed as a single compound_list that is executed after the entire command has been parsed. With the $(command) form any valid program can be used for command, except a program consisting solely of redirections which produces unspecified results.

On page 2393 line 76554 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The shell shall tokenize (see [xref to XCU 2.3 Token Recognition]) the contents of the file, parse the tokens (see [xref to XCU 2.10 Shell Grammar]), and execute the resulting commands in the current environment. It is unspecified whether the commands are parsed and executed as a program (as for a shell script) or are parsed as a single compound_list that is executed after the entire file has been parsed.

On page 2394 line 76620 (XCU 2.14 eval), change:
The eval utility shall construct a command by concatenating arguments together, separating each with a <space> character. The constructed command shall be read and executed by the shell.
to:
The eval utility shall construct a command string by concatenating arguments together, separating each with a <space> character. The constructed command string shall be tokenized (see [xref to XCU 2.3 Token Recognition]), parsed (see [xref to XCU 2.10 Shell Grammar]), and executed by the shell in the current environment. It is unspecified whether the commands are parsed and executed as a program (as for a shell script) or are parsed as a single compound_list that is executed after the entire constructed command string has been parsed.

On page 2459 line 78855 (XCU alias), change:
An alias definition provides a string value that shall replace a command name when it is encountered; see [xref to 2.3.1].
to:
An alias definition provides a string value that shall replace a command name when it is encountered. For information on valid string values, and the processing involved, see [xref to 2.3.1].

On page 2460 line 78908 (XCU alias APPLICATION USAGE), change:
None.
to:
Care should be taken to avoid alias values that end with a character that could be treated as part of an operator token, as it is unspecified whether the character that follows the alias name in the input can be used as part of the same token (see [xref to 2.3.1]). For example, with:
$ alias foo='echo 0'
$ foo>&2
the shell can either pass the argument '0' to echo and redirect fd 1 to fd 2, or pass no arguments to echo and redirect fd 0 to fd 2. Changing it to:
$ alias foo='echo "0"'
avoids this problem. The alternative of adding a <space> after the '0' would also avoid the problem, but in addition it would alter the way the alias works, as described in [xref to 2.3.1].

Likewise, given:
$ alias foo='some_command &'
$ foo&
the shell may combine the two '&' characters into an && (and) operator. Since the alias cannot pass arguments to some_command and thus can be expected to be invoked without arguments, adding a <space> after the '&' would be an acceptable way to prevent this. Alternatively, the alias could be specified as a grouping command:
$ alias foo='{ some_command & }'

Problems can occur for tokens other than operators as well, if the alias is used in unusual ways. For example, with:
$ alias foo='echo $'
$ foo((123))
some shells combine the '$' and the "((123))" to form an arithmetic expansion, but others do not (resulting in a syntax error).

On page 3721 after line 127547 (XRAT C.2.3.1), insert:
Some implementations add a <space> after the alias value when performing alias substitution in order to prevent the last character of the alias value and the first character after the alias name in the input from combining to form an operator. However, the extra <space> can have side-effects in other situations, such as if the alias value ends with an unquoted <backslash>. Implementations which do this are encouraged to change to an alternative method of delimiting a partial operator token at the end of an alias value.

Some, but not all, shell implementations do not process changes to alias definitions until the current <tt>compound_list</tt> (see [xref to XCU 2.10 shell grammar]) has completed. In these shells, alias changes do not take effect until the end of the dot script, eval command, function invocation, if statement, case statement, for statement, while statement, or until statement containing the alias change.

Many shell implementations execute the contents of a file, typically <tt>~/.profile</tt>, when invoked as a login shell. The standard developers are unaware of any such implementations that process the contents of <tt>~/.profile</tt> (and similar startup files) as a single <tt>compound_list</tt>, so alias changes in <tt>~/.profile</tt> typically do take effect before the end of <tt>~/.profile</tt>.


(0004243)
agadmin (administrator)
2019-02-07 17:15

Interpretation proposed: 7 Feb 2019
(0004366)
agadmin (administrator)
2019-04-17 10:52

Interpretation approved: 17 April 2019

- Issue History
Date Modified Username Field Change
2015-06-04 00:22 wpollock New Issue
2015-06-04 00:22 wpollock Status New => Under Review
2015-06-04 00:22 wpollock Assigned To => ajosey
2015-06-04 00:22 wpollock Name => Wayne Pollock
2015-06-04 00:22 wpollock Section => 2.3.1 Alias Substitution
2015-06-04 09:26 joerg Note Added: 0002694
2016-02-04 17:01 Don Cragun Page Number => 2322
2016-02-04 17:01 Don Cragun Line Number => 73690-73705
2016-02-04 17:01 Don Cragun Interp Status => ---
2016-02-04 17:04 Don Cragun Project 1003.1(2008)/Issue 7 => 1003.1(2013)/Issue7+TC1
2016-03-04 09:49 joerg Note Added: 0003089
2016-03-04 09:50 joerg Note Edited: 0003089
2016-03-04 11:39 joerg Note Edited: 0003089
2016-03-04 11:40 joerg Note Edited: 0003089
2016-03-04 15:11 joerg Note Edited: 0003089
2016-03-31 16:29 rhansen Note Added: 0003113
2016-03-31 16:30 rhansen Note Edited: 0003113
2016-03-31 16:32 nick Note Edited: 0003113
2016-03-31 16:33 nick Interp Status --- => Pending
2016-03-31 16:33 nick Final Accepted Text => See bugnote: 3113
2016-03-31 16:33 nick Status Under Review => Interpretation Required
2016-03-31 16:33 nick Resolution Open => Accepted As Marked
2016-03-31 16:33 nick Final Accepted Text See bugnote: 3113 => See Note: 0003113
2016-03-31 16:33 nick Note Edited: 0003113
2016-03-31 16:34 nick Tag Attached: tc3-2008
2016-03-31 16:34 rhansen Note Edited: 0003113
2016-03-31 16:40 rhansen Note Edited: 0003113
2016-04-01 12:29 ajosey Interp Status Pending => Proposed
2016-04-01 12:29 ajosey Note Added: 0003116
2016-04-10 22:09 jilles Note Added: 0003148
2016-04-11 14:31 chet_ramey Note Added: 0003149
2016-04-11 20:59 shware_systems Note Added: 0003150
2016-04-12 08:58 joerg Note Added: 0003151
2016-04-12 12:58 chet_ramey Note Added: 0003152
2016-04-12 14:49 joerg Note Added: 0003153
2016-04-13 09:07 joerg Note Edited: 0003153
2016-04-13 17:15 chet_ramey Note Added: 0003154
2016-04-13 17:43 Don Cragun Note Edited: 0003151
2016-04-14 09:29 joerg Note Added: 0003155
2016-04-14 09:33 joerg Note Edited: 0003155
2016-04-14 09:34 joerg Note Edited: 0003155
2016-04-14 16:22 kre Note Added: 0003156
2016-04-14 16:37 kre Note Edited: 0003156
2016-04-14 19:39 chet_ramey Note Added: 0003157
2016-04-14 20:18 rhansen Relationship added related to 0000736
2016-04-14 20:33 shware_systems Note Added: 0003158
2016-04-14 22:15 eblake Note Added: 0003159
2016-04-14 23:12 rhansen Note Added: 0003160
2016-04-15 09:35 kre Note Added: 0003161
2016-04-15 10:06 kre Note Added: 0003162
2016-04-15 13:49 shware_systems Note Added: 0003163
2016-04-15 19:09 chet_ramey Note Added: 0003164
2016-04-15 19:10 chet_ramey Note Edited: 0003164
2016-04-17 23:49 rhansen Note Added: 0003165
2016-04-17 23:50 rhansen Note Edited: 0003165
2016-04-17 23:51 rhansen Note Edited: 0003165
2016-04-18 12:47 chet_ramey Note Added: 0003166
2016-04-18 21:21 rhansen Note Added: 0003167
2016-04-18 21:21 rhansen Note Edited: 0003167
2016-04-18 21:22 rhansen Resolution Accepted As Marked => Reopened
2016-04-28 15:18 rhansen Note Added: 0003178
2016-04-28 15:30 rhansen Relationship added related to 0001048
2016-05-02 21:30 stephane Note Added: 0003195
2016-05-03 09:30 joerg Note Added: 0003196
2016-05-03 11:08 stephane Note Added: 0003197
2016-05-04 06:38 stephane Note Added: 0003198
2016-06-02 16:53 rhansen Relationship added related to 0001055
2017-01-18 15:25 ajosey Interp Status Proposed => Approved
2017-01-18 15:25 ajosey Note Added: 0003553
2017-01-18 16:07 ajosey Note Deleted: 0003553
2017-01-18 16:08 ajosey Interp Status Approved => Pending
2017-01-18 16:08 ajosey Note Added: 0003555
2019-01-09 12:29 geoffclare Note Added: 0004201
2019-01-09 12:32 geoffclare Note Edited: 0004201
2019-01-09 12:40 geoffclare Note Edited: 0004201
2019-01-15 10:44 geoffclare Note Edited: 0004201
2019-01-18 11:48 geoffclare Note Added: 0004214
2019-01-18 11:52 geoffclare Note Edited: 0004214
2019-01-18 12:03 geoffclare Note Edited: 0004214
2019-02-04 10:15 geoffclare Note Edited: 0004214
2019-02-07 16:33 geoffclare Note Edited: 0004214
2019-02-07 16:36 geoffclare Final Accepted Text See Note: 0003113 => See Note: 0004214
2019-02-07 16:36 geoffclare Resolution Reopened => Accepted As Marked
2019-02-07 17:15 agadmin Interp Status Pending => Proposed
2019-02-07 17:15 agadmin Note Added: 0004243
2019-04-17 10:52 agadmin Interp Status Proposed => Approved
2019-04-17 10:52 agadmin Note Added: 0004366


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker