0001026: The shell should support access to all 32 bit from the exit code

ID	Project	Category	View Status	Date Submitted	Last Update

0001026	1003.1(2013)/Issue7+TC1	Shell and Utilities	public	2016-01-28 13:16	2022-07-25 15:17

Reporter	joerg	Assigned To
Priority	normal	Severity	Editorial	Type	Enhancement Request
Status	Closed	Resolution	Rejected

Name	Jörg Schilling
Organization
User Reference
Section	2.5.2
Page Number	2324, 2382
Line Number	73738-73769, 75887
Interp Status	---
Final Accepted Text


Summary	0001026: The shell should support access to all 32 bit from the exit code
Description	The current shell description is based on a very outdated version of the POSIX standard that does not yet include the waitid() interface. Since waitid() was added aprox. 20 years ago, waitid() is a non-optional part of the POSIX standard and permits to retrieve all 32 bits from the exit() call in a child process and in addition implements an easier to integrate interface than waitpid() and the W*() macros. New software should be encouraged to use the waitid() interface instead of wait() or waitpid() and the shell should be enhanced to support the better interface from waitid(). This results in a clean separation of the information for the reason of a child termination from the exit() code, the termination signal and other problems like "file not found" or "file not executable". The shell also should be able to base it's internal logic on whether the exit() parameter was != 0 regardless of whether exitcode mod 256 is zero or not.
Desired Action	On age 2382 after line 75887 insert: fullexitcode Do not mask the exit code with 0xFF when expanding $?. This gives access to the full 32 bits from the child's exit code via $? on all POSIX operating systems that sup- port waitid(). This also makes the shell logic for condditional execution based on the full 32 bit from the exit code. On page 2324 after line 73769 insert: $/ This varaiable contains a signed decimal number in case that the program terminated normally. In case the program was terminated by a signal, it contains the name of the signal and in the other cases, it contains the name for the reason listed for ${.sh.codename} below. .sh.code The numerical reason waitid(2) returned for the child status change. It matches the CLD_* definitions from signal.h. Note that the numbers are usually in the range 1..6 but this is not guaranteed. Use ${.sh.codename} for portability. .sh.codename The reason waitid(2) returned for the child status change as text that is generated by stripping off CLD_ from the related definitions from signal.h. Possible values are: EXITED The program had a normal termination and the exit(2) code is in ${.sh.status}. KILLED The program was killed by a signal, the signal number is in ${.sh.status} the signal name is in ${.sh.termsig}. DUMPED The program was killed by a signal, similar to KILLED above, but the program in addition created a core dump. TRAPPED A traced child has trapped. STOPPED The program was stopped by a signal, the signal number is in ${.sh.status} the signal name is in ${.sh.termsig}. CONTINUED A stopped child was continued. NOEXEC An existing file could not be executed. This can happen when e.g. either the type of the file is not plain file or when the file does not have execute per- mission, or when the argument list is too long. This is not a result from waitid(2) but from execve(2). NOTFOUND A file was not found and thus could not be executed. This is not a result from waitid(2) but from execve(2). The child codes NOEXEC and NOTFOUND in ${.sh.codename} may need shared memory (e.g. from vfork(2)) to allow a reliable reporting. .sh.pid The process number of the process that caused the current waitid(2) status. .sh.signame The name of the causing signal. If the status is related to a set of waitid(2) return values, this is CHLD or CLD, depending on the os. When a trap(1) command is executed, ${.sh.signame} holds the signal that caused the trap. .sh.signo The signal number related to ${.sh.signame}. .sh.status The decimal value returned by the last synchronously executed command. The value is unaltered and con- tains the full int from the exit(2) call in the child in case the shell is run on a modern os. .sh.termsig The signal name related to the numerical ${.sh.status} value. The translation to signal names takes place regardless of whether the child was ter- minated by a signal or terminated normally. It may help to mention a code fragment to emulate waitid() on non-POSIX systems for portability: static int waitid(idtype, id, infop, opts) idtype_t idtype; id_t id; siginfo_t infop; / Must be != NULL / int opts; { int exstat; pid_t pid; opts &= ~(WEXITED\|WTRAPPED); / waitpid() doesn't understand them */ #if WSTOPPED != WUNTRACED if (opts & WSTOPPED) { opts &= ~WSTOPPED; opts \|= WUNTRACED; } #endif if (idtype == P_PID) pid = id; else if (idtype == P_PGID) pid = -id; else if (idtype == P_ALL) pid = -1; else pid = 0; infop->si_utime = 0; infop->si_stime = 0; pid = waitpid(pid, &exstat, opts); infop->si_pid = pid; infop->si_code = 0; infop->si_status = 0; if (pid == (pid_t)-1) return (-1); if (WIFEXITED(exstat)) { infop->si_code = CLD_EXITED; infop->si_status = WEXITSTATUS(exstat); } else if (WIFSIGNALED(exstat)) { if (WCOREDUMP(exstat)) infop->si_code = CLD_DUMPED; else infop->si_code = CLD_KILLED; infop->si_status = WTERMSIG(exstat); } else if (WIFSTOPPED(exstat)) { if (WSTOPSIG(exstat) == SIGTRAP) infop->si_code = CLD_TRAPPED; else infop->si_code = CLD_STOPPED; infop->si_status = WSTOPSIG(exstat); } else if (WIFCONTINUED(exstat)) { infop->si_code = CLD_CONTINUED; infop->si_status = 0; } return (0); }
Tags	No tags attached.

~~user229~~ 2016-01-30 05:32 bugnote:0003057	Re. NOEXEC and NOTFOUND, I don't see why it couldn't be reported using a pipe. Also, why only these two, and not the whole suite of possible errno values that can be returned from the exec family (or fork)? It reflects the traditional boundary of 127 for "not found" and 126 for "other", but there's no reason to limit a novel reporting mechanism in this way.

shware_systems 2016-01-30 12:16 reporter bugnote:0003058	Alternate method brought up in phone call: <signal.h> adds SIGEXIT as a signal number with SIG_DFLT type of I and non-maskable like SIGKILL or SIGSTOP. When a sigaction handler is used, si_code can be: EXIT_NORMAL Child has exited. EXIT_SIGNAL Child has terminated due to a signal and did not create a core file. EXIT_DUMPED Child has terminated abnormally and created a core file. While these duplicate SIGCHLD codes they are limited to termination reasons that affect the shell. A shell can install a handler that sets the above $/ variable, before initiating any action specified for EXIT using the trap builtin. $/ is superfluous, actually, but benign. The exit() interfaces get extended to require they shall use the effect of sigqueue() with SIGEXIT as signal number to the parent process, in addition to any SIGCHLD raise() or sigqueue(). The trap builtin gets extended with a -f flag, that limits the action for a signal to invoking a script function by name. Syntax: trap -f funcname [SA_OPTS] condition... This differs from the 'trap action conditions' format in that eval is not used and the default handler type is a sigaction one, not sighandler, so [SA]_SIGINFO unnecessary as an SA_OPTS argument. For brevity, the SA_OPTS arguments can be specified using simply an underscore. Whether an implemention should or shouldn't support a particular option I leave open. The reason for limiting the action to a shell function is so the siginfo fields relevant to any signal being unqueued can be passed as positional parameters of the function. Each parameter would use effectively a "$(printf '%s=%d' fieldname value)" substitution, with fieldname skipping the 'si_' prefix. For some fields '%s=%s' as format can be used too. Where a function is handling multiple signals the first parameter, $1, would normally be 'signo=nnn' or 'signo=EXIT', as example. If a function is handling only one signal it can be the last parameter or missing entirely. Because of the naming any order possible and shouldn't affect script portability. When a script uses trap -f exitfunc EXIT exitfunc gets the 32-bit exit code passed to exit() as positional parameter status=nnnn. The other fields beginning .sh. would be parameters also, and if the script desires it can assign those values to names it chooses, not have those static names cluttering the variables name space. A side benefit is a handler can use variable prefixes in combination with a pid= value to track termination of both synchronous and asynchronous child processes or sub-shells. For backwards compatibility the processing for $? stays unchanged. The fullexitcode option to set is unnecessary also. Existing scripts using trap don't need changes either, except for one remotely possible case. What wait() and waitpid() put in stat_loc also doesn't change, so code based on them will still work.

~~user229~~ 2016-01-30 15:53 bugnote:0003059 Last edited: 2016-01-30 15:56	What does SIGEXIT provide, as an actual signal, that SIGCHLD doesn't? Couldn't this be synthesized within the shell (if the main benefit is that trap handlers work differently) rather than being a real signal? Also, the shell already has an EXIT trap, for when the shell itself exits.

shware_systems 2016-01-30 19:12 reporter bugnote:0003061	SIGCHLD can be blocked or ignored, SIGEXIT is specified as non-blockable so is always queued. This matches wait() and waitid(), in that they get a notification of termination whether SIGCHLD blocked or not. Also, implementations are encouraged to use some of bits 8-31 as flags the W* macros test and strip off, and these bits may also be set in the si_status field for SIGCHLD, along with easier to use values in si_code. It's not prohibited, anyways, that I see. I left this implied, sorry, but I expect SIGEXIT to prohibit this, as exit() has only the exit code passed in to be stored in si_status, or would store the terminating signal number by itself. I also glossed over that if an action is set using the current form of trap for EXIT that the current usage expectation still holds. A new handler would be expected to compare the passed in pid to itself, to differentiate from one used by a child process, to perform any necessary atexit() type processing. Enabling this for an application would mean defining SA_EXITSELF as an option for sigaction(SIGEXIT), effectively adding atexit(siginfo_t info) and atquickexit(info) as interfaces. Overload si_band to hold a thread id and you can get pthread_atexit(info) also easily enough. As things go I think what I outlined is a plausible method for any application to get the limited or full exit codes, not just the shell, with only two backwards compatibility concerns. That is -f as a flag may eval to an actual file on disk intended to be a condition action handler for a current script. I don't see that as a high liklihood, but it exists and that file and scripts referencing it would need a new name. The other is some <signal.h> I'm not familiar with may use SIGEXIT as a signo define already. There are cleaner alternatives, sure, but I don't see the C standard changing the definition of main() and system() to return siginfo_t instead of int any time soon.

~~user229~~ 2016-01-30 20:46 bugnote:0003062	I don't see why main would have to be changed. As for system, what about a system_ex(const char , siginfo_t ) - and pclose_ex? Another thing to keep in mind if we're thinking about system (or popen) is that the shell is an intermediary. The shell is unlikely to be killed by a signal (though it may exec a process that is), or to fail to be executed (though it still could be due to argument size).

shware_systems 2016-01-31 01:14 reporter bugnote:0003063	It's not that it has to, but is something that could be done with the current interfaces C requires. If there's problems there POSIX inherits them, additional interfaces can hide things but doesn't address everything. A new application still has to elect to use the new interfaces.

kre 2016-04-05 14:14 reporter bugnote:0003128	I can understand wanting to separate the exit code and exit reason (signal, etc) and keep those separate, but I cannot think of a single reason why anyone would want to support more than a hundred different exit codes - for a single application, 3 or 4 should be sufficient, more in the overall system just in case there is some perverse desire to have lots of applications, each with a distinct set of failure exit codes (I would assume there is no plan to alter the single 0 meaning "success"). Keeping the limitation on applications to not expect the system to support exit codes outside the range 0..255 seems entirely the right thing to do. That is not to prohibit systems allowing more exit codes than that if they can find some reason for so doing.

joerg 2016-04-05 14:25 reporter bugnote:0003129	640k is enough for anyone... BTW: it was the range 0..125 not 0..255 and please note that the range of errno is already beyond the range 0..125.

kre 2016-04-07 00:24 reporter bugnote:0003133	wrt: 640k is enough for anyone... entirely different kinds of issues. And: BTW: it was the range 0..125 not 0..255 In the shell, yes, and if there was a sane way to redefine the way that "wait" (the shell command) works to allow it to get the values 126..255 in a way that was backwards compatible, I'd support that. Applications can supply an exit code from 0-255 in a portable way, which has always worked (at C/kernel level). Outside that range is not portable (the parameter for "exit" is an int only because of the C parameter type widening rules, the kernel has only ever guaranteed 8 bits of value there). It is a pity that sh originally chose to limit it further by combining the "why the application exited" value, with the "exit code when it exited normally" in a way that made them impossible to distinguish. But it did. This group (no standards group) are not legislators - it is not appropriate to specify what we wish had been done originally, while both requiring that for future implementations (which would not be so bad) while also promising users that is how systems behave (which is simply false.) Just document that standard behaviour - do not try to force changes.

joerg 2016-04-07 08:52 reporter bugnote:0003136	You seem to missinterpret things. We are no longer in the first 20 years of UNIX, but in the following 26 years that allow applications to return 32 bits from the exit code to the parent process. This issue is not to enforce changes, but rather to document existing behavior from the POSIX compliant waitid() and the existing recent Bourne Shell implementation that is based on this POSIX compliant behavior from waitid(). Given that ignoring parts of the exit code must be seen as a bug, this is also a bug-fix.

kre 2016-04-07 13:25 reporter bugnote:0003137	Joerg - aside from it being a little odd that you're dismissing ancient history (which is actually not ancient, systems still work this way) when you are usually complaining that the spec is different from the way some 1980's vintage shell worked, but the issue here is do essentially all modern shells actually pass back 32 bits of exit code as the result of the wait command? If not (and I suspect "not" is correct) then what you are attempting here is to legislate the way that you believe the world should operate, rather than specify the way it actually does. That's not correct behaviour of a standards body. What you need to do is convince all the shell authors to actually implement wait the way that you believe it should be implemented, and once that is the standard behaviour, it can be documented as that. Until then, pretending that just because waitid() has a 32 bit field into which an exit status can be put, and that the exit() call takes an int as a parameter, means that when a program does exit(0x12345678) that the shell will set $? to 305419896 does no-one any good. I cannot test this, as my system still does as posix (as currently published) still allows, and only handles exit values in the range 0..255.

joerg 2016-04-08 10:22 reporter bugnote:0003140	You miss the way POSIX works: It standardizes existing implementations, it does not modify them unless they contain a bug. But then it needs to give a rationale on why it did not standardize existing behavior. For this reason, we have 20 years of history where POSIX requires to return all 32 bits from exit() with the mandatory waitid(). If your personal environment does not support this, it is not POSIX.

~~user229~~ 2016-04-08 14:17 bugnote:0003142 Last edited: 2016-04-08 14:20	"For this reason, we have 20 years of history where POSIX requires to return all 32 bits from exit() with the mandatory waitid()." Which requirement does not in fact entail that all other mechanisms for obtaining an exit status (wait, $?, system) shall also return all 32 bits. Nor that all languages that provide support for calling external programs (sh, awk, ex) shall support any such mechanism. The shell does not have a binding to the system interfaces in general, or to waitid in particular, and therefore is not bound by requirements the standard puts on waitid.

kre 2016-04-09 11:22 reporter bugnote:0003143 Last edited: 2016-04-09 11:27	Wrt note 3140 (and the last I am going to say on this issue) For this reason, we have 20 years of history where POSIX requires to return all 32 bits from exit() with the mandatory waitid(). I can find absolutely nothing in the published standard that supports this. On the contrary, the exact opposite (in the sense in which we are talking) is stated. http://pubs.opengroup.org/onlinepubs/9699919799/functions/exit.html The value of status may be 0, EXIT_SUCCESS, EXIT_FAILURE, [CX] [Option Start] or any other value, though only the least significant 8 bits (that is, status & 0377) shall be available to a waiting parent process. [Option End] Note "only the least significant 8 bits shall be available". The description of waitid http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html says nothing to this issue directly at all, but defers to signal.h http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html int si_status Exit value or signal. Which is all it says about the field - it is an int, into which a value with no specific specified range is to be placed (other than that it obviously needs to fit in an int.) The range of values from exit (0..255) fits, so that is OK, the range of signal numbers (0.. who knows, but usually < 128) also fits. All looks good, and no mention anywhere at all about systems passing back 32 bit exit values from applications to parent processes (including shells.) None. Any inference you could draw about the range of exit values from the type being "int" would logically have to apply to signal numbers as well, right? So if your argument is that the system must support a range of exit values from -2^31 .. 2^31-1 (or perhaps 0..2^32-1) then it would logically also be required to support signal numbers with the same range, right? After all they both get stored (in appropriate circumstances) into the same field. And if your inference comes from the parameter to exit() being an int, then you can draw the same inference from the (signal number) parameter to kill(). Personally, I consider this issue closed. No only is there no posix requirement for exit codes outside the range 0..255 (it actually says that only those values shall be available, not just that systems are not required to support a wider set of values) I also cannot see (and no-one has presented) a single reason why a larger set of exit code values would be of any use to anyone whatever (and no, wanting to send back errno as an exit code is not good enough - you'd have to explain why just "failed" with the failure reason on stderr, or similar, is not a better approach.)

shware_systems 2016-04-09 21:35 reporter bugnote:0003146	Re: 3128 One of the reasons for wanting a large error/status code range is while an application may have only a few generic types of errors, there may be multiple places in the application where each type may be returned. In the absence of a core dump or debugger providing an indication of exactly where in the code an error type occurred, the application may encode both type, as expressed by errno, and a usage count that can be mapped back to a source file and line number. For cases where SIGHUP on a terminal used for stdout and stderr makes a more descriptive message impossible, what is left as an _always_available_ reporting method is the exit code. Signals are similar, in that they can be masked off or set to SIG_IGN in the parent, outside any control of the application. Also, there are applications that display numbered choices interactively and report the choice made via the exit code to a script, avoiding the system overhead of using a pipe or stdout and catching SIGHUP. When the choices are filenames from a glob expansion, as example, this can be well over 255 entries. A zero usually means EXIT_ABORT or EXIT_NEXTBLOCK (if it groups the choices in blocks of 100 or so to stay in the 1 to 125 range), not EXIT_SUCCESS, when using apps like this. For an international app they may elect to use a utility that shows a localized menu of 0=abort prompt, 1=yes, 2=no, to avoid the locale used with stdin to match a keyboard's charset, LC_CTYPE nominally, after output respecting LC_MESSAGES being set to a different locale, not having a direct mapping to the yesexpr and noexpr prompt string characters. Per note 3061, the standard requires the status value returned by wait() to encode in the int sized container the exit code and any additional bits necessary for evaluation of all the W* macros unambiguously. As no particular encoding of those extra bits is specified, currently no application or interface, including waitid() and users of it, can make assumptions about which bits are not reserved by wait() to implement extensions with portably. The standard requires those bits have an analogue in si_code, but does not require si_status to have the bits stripped out so it isn't also usable with the W* macros. The example in the description elects to, but infop->si_status=exstat; legal also, it looks. This possibility is inconsistent with the expectation that si_status values set by the application for use with sigqueue() are presented to the signal handler unchanged.

stephane 2020-01-31 15:57 reporter bugnote:0004764 Last edited: 2020-01-31 15:59	[Copied from the mailing list in a discussion about 0001321] $/ is not a very good choice of parameter name IMO. That would break widely seen code like sed "s/.$//" sed "/^$/d; s/$foo/$bar/g" Those are non-POSIX code as POSIX currently leaves the behaviour unspecified if an unescaped $ is followed by a /, but it's commonly seen in the wild as it works in all implementations in practice (except recent versions of bosh). (in other words, / is often seen following $ in arguments to sed/awk/perl/pax/bsdtar... sometimes not within single quotes). zsh has similar problems with $~var, $=var, $^var, though to a lesser extent as things like sed "s~.$~~" are less widely used in practice. ($/ is the record separator variable in perl (/ visually conveys "separation" more than exit status IMO). perl is the only language I can think of other than Bourne-like shells where $? is the exit status. Most other shells (csh, fish, rc, akanga, zsh) use $status instead. In csh, $?var expands to 1 if $var is set and 0 otherwise.)

mirabilos 2021-07-28 16:35 reporter bugnote:0005416	SIGEXIT is also a bad name; the EXIT trap is in widespread use, and shells have been allowing symbolic names for signals, omitting the leading SIG, for ages as well.

~~Don Cragun~~ 2022-07-25 15:17 manager bugnote:0005908	This feature is not implemented in most conforming shells. Therefore, this bug is rejected. If other shells do implement this feature, please resubmit this bug for inclusion in a future revision of the standard.

Date Modified	Username	Field	Change
2016-01-28 13:16	joerg	New Issue
2016-01-28 13:16	joerg	Name	=> Jörg Schilling
2016-01-28 13:16	joerg	Section	=> 2.5.2
2016-01-28 13:16	joerg	Page Number	=> 2324, 2382
2016-01-28 13:16	joerg	Line Number	=> 73738-73769, 75887
2016-01-28 13:26	joerg	Tag Attached: issue8
2016-01-28 16:09	~~Don Cragun~~	Relationship added	related to 0000947
2016-01-30 05:32	~~user229~~	Note Added: 0003057
2016-01-30 12:16	shware_systems	Note Added: 0003058
2016-01-30 15:53	~~user229~~	Note Added: 0003059
2016-01-30 15:56	~~user229~~	Note Edited: 0003059
2016-01-30 19:12	shware_systems	Note Added: 0003061
2016-01-30 20:46	~~user229~~	Note Added: 0003062
2016-01-31 01:14	shware_systems	Note Added: 0003063
2016-04-05 14:14	kre	Note Added: 0003128
2016-04-05 14:25	joerg	Note Added: 0003129
2016-04-07 00:24	kre	Note Added: 0003133
2016-04-07 08:52	joerg	Note Added: 0003136
2016-04-07 13:25	kre	Note Added: 0003137
2016-04-08 10:22	joerg	Note Added: 0003140
2016-04-08 14:17	~~user229~~	Note Added: 0003142
2016-04-08 14:19	~~user229~~	Note Edited: 0003142
2016-04-08 14:20	~~user229~~	Note Edited: 0003142
2016-04-09 11:22	kre	Note Added: 0003143
2016-04-09 11:27	kre	Note Edited: 0003143
2016-04-09 21:35	shware_systems	Note Added: 0003146
2019-05-23 16:06	geoffclare	Tag Detached: issue8
2020-01-31 15:57	stephane	Note Added: 0004764
2020-01-31 15:59	stephane	Note Edited: 0004764
2021-07-28 16:35	mirabilos	Note Added: 0005416
2022-07-25 15:17	~~Don Cragun~~	Interp Status	=> ---
2022-07-25 15:17	~~Don Cragun~~	Status	New => Closed
2022-07-25 15:17	~~Don Cragun~~	Resolution	Open => Rejected
2022-07-25 15:17	~~Don Cragun~~	Note Added: 0005908

View Issue Details

Relationships

Activities

Issue History