0001585: kill - add -j option to avoid PID reuse race

ID	Project	Category	View Status	Date Submitted	Last Update

0001585	1003.1(2016/18)/Issue7+TC2	Shell and Utilities	public	2022-05-14 22:05	2022-10-31 16:06

Reporter	steffen	Assigned To
Priority	normal	Severity	Editorial	Type	Enhancement Request
Status	Closed	Resolution	Rejected

Name	steffen
Organization
User Reference
Section	Vol. 3: Shell and Utilities
Page Number	2879
Line Number	94942
Interp Status	---
Final Accepted Text


Summary	0001585: kill - add -j option to avoid PID reuse race
Description	With today's high-speed multi-core machines and fine-grain locked operating systems process identifier (PID) reuse may occur very fast. Let me quote a message of mine (20220301174917.eoVFB%steffen@sdaoden.eu): \| \|NetBSD does guarantee not to reuse a pid for a reasonable number \| \|of forks after a process exits. \| \|...which might be fruitless with 16-bit pids, define "reasonable". \|Matt Dillon of DragonFly BSD (crond etc.) made, after implementing \|some DBSD kernel optimizations (iirc), tests with statically \|linked programs and... quoting myself \| \| i remember Matthew Dillon's post on DragonFly BSD users@[1], where \| he claims 450000 execs per second for a statically linked binary, \| and about 45000 execs per second for a dynamic one, with DragonFly \| 5.6 on a threadripper. \| \| [1] https://marc.info/?l=dragonfly-users&m=155846667020624&w=2 So statically linked programs could consume a 16-bit process identifier range seven (7) times per second, which exceeds even the Earth Overshoot day of countries like U.S.A, Australia, Finland or Sweden by almost a factor of two, or China by 3.33. Today it is impossible to savely kill(1) child processes, because even in code like JOBS=number of jobs JOBMON=non-empty if the sh(1)ell supports set -m # (then childs are started under set -m as process groups) jtimeout() { i=0 while [ ${i} -lt ${JOBS} ]; do i=`add ${i} 1` if [ -f t.${i}.id ] && ^ The child process removes this file when it exits regulary. read pid < t.${i}.id >/dev/null 2>&1 && ^ The file contains the identifier of the process (group). kill -0 ${pid} >/dev/null 2>&1; then ^ We test whether it really is alive. j=${pid} [ -n "${JOBMON}" ] && j=-${j} kill -KILL ${j} >/dev/null 2>&1 ^ If so, we kill it very hard, as it exceeded a timeout (which in this case means something is totally wrong). else ${rm} -f t.${i}.id ^ (Maybe it died badly, and could not cleanup.) fi done } there still is a race in between the "kill -0" that tests process existance, and the "kill -KILL" that aborts it. The latter kill(1) could kill the wrong process. Of course one could write code where the timeout is say 30 seconds, and the parent sh(1)ell creates a date(1) stamp before it starts the child. Then the shell could say wait 35 seconds and then simply kill the child if that is still alive via kill -0, as that would not terminate itself if more than 30 seconds have passed since it was started, but wait for being killed in that case. But this seems a strange approach (and when nitpicking can be subject to clock jumps).
Desired Action	Add, on page 2879, line 94942: -j JOB Process the kill request only when the given JOB number is known to the shell, and the JOB has not yet terminated, otherwise exit with status 66 (EX_NOINPUT from sysexits.h). Since the POSIX entry for kill(1) mentions two times that the "job control job ID notation is not required to work as expected when kill is operating in its own utility execution environment" i think no further addition is needed.
Tags	No tags attached.

geoffclare 2022-05-16 08:21 manager bugnote:0005835 Last edited: 2022-05-16 08:33	As far as I can see, the only time this -j option would be useful is if an application wants to send a signal just to a process group leader without sending it to the other processes in the group. This seems like a very rare thing for an application to need to do. It would not be useful in the other two possible cases, which are: 1. The signal is to be sent to the whole process group. In this case, the application can just use "kill JOB". 2. The signal is to be sent to one or more of the processes comprising a process group, but not the whole group and not just the leader. In this case the use of -j does not solve the problem, as any of those process IDs (other than the leader) could have been reused even though the leader is still running and thus the job still exists. Anyway, a discussion of the technical merits of the proposal is pointless unless there is a shell which already implements this kill -j option. None of the shells I have available do. Does anybody know of one that does? If not, this request should be rejected as invention.

kre 2022-05-16 10:08 reporter bugnote:0005836	I agree that this is invention, and should be rejected, but even if some shell did try implementing it, I cannot see how they would do so in a way that would meet the objectives of the issue raised. As best I can tell, doing anything as suggested requires kernel assistance, as no matter how carefully the shell checks, there's no way that it can avoid race conditions, it must check first, and then do the kill sys call, and in the intervening period, things might have altered. If the kernel had a kill_my_child() sts call, then it could be made to work I think, as the shell could check the pids it knows belong to its children, and avoid creating any new ones between that check and doing the kill_my_child() sys call, but since I don't know of any system that implements a sys call like that (orr an option on kill - it cold be done by setting a high order bit in the signal number - then I cannot see how a shell could possibly make this work well enough to make adding such an option sensible. kre ps": such a new sys call would work on pgrps just the same as kill() does.

steffen 2022-05-16 13:37 reporter bugnote:0005837	First of all: correction: what i really meant was -j Process the kill request only when the given jobs or saved-aways process identifiers are known to the shell, and have not yet terminated, otherwise exit with status 66 (EX_NOINPUT from sysexits.h).

steffen 2022-05-16 13:54 reporter bugnote:0005838	re 5835 and 5836: It is clear the idea is that the shell's children will remain in the operating system's table of active processes until they have been wait(2)ed for, therefore only the sh(1)ell as the parent process can kill(2) the child safely. -j is thus meant to give the sh(1)ell script writer access to the race-free capability that the sh(1)ell as such has in its internals (anyway). Regarding process groups. Yes, this is true, of course, but i think it is weak reasoning to not offer the hand because somewhere down the process chain such things may happen. Quite the opposite, if i have the possibility everywhere, i can write race-free sh(1)ell scripts on all (subshell) levels. And programs with direct access to wait(2) that start child processes are hopefully doing it right anyway. They at least could using POSIX interfaces. But sh(1)ell script can not, even though the sh(1)ell as such can, or even has to do right. This is what this issue wants to change. Regarding operating system support. Oh, that is true!! They do start implementing this but unfortunately non-portable, POSIX is late and should possibly have tried to set a scent here in the past. Linux has the prctl(2)s PR_SET_CHILD_SUBREAPER and PR_GET_CHILD_SUBREAPER A subreaper fulfills the role of init(1) for its descendant pro‐ cesses. When a process becomes orphaned (i.e., its immediate parent terminates), then that process will be reparented to the nearest still living ancestor subreaper. Subsequently, calls to getppid(2) in the orphaned process will now return the PID of the subreaper process, and when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination sta‐ tus. and FreeBSD has an even more sophisticated approach that allows iterating over the "descendants of the reaper", especially as with PROC_REAP_KILL the possibility to kill only a subset of these. It could be that in order to implement timeout(1) properly portions of this functionality need to be implemented kernel-wise. Or already have done so.

steffen 2022-05-16 23:41 reporter bugnote:0005839	Maybe i do misunderstand. Then i would retract the issue. What this issue wants to achieve is to close the gap in between wait(1) and waitpid(2)/x(2). - A saved-away process identifier will be known to the shell unless wait(1) has been called on it. - The process itself is known to the operating system aka kept in the process table until it has been waitpid(2)ed for. If i kill(1) a process that is still known to the sh(1)ell because wait(1) has not yet been called, but the shell itself has already waitpid(2)ed on the child, after having received SIGCLD or for whatever reason, then the operating system may already have reused the process identifier as such. The -j option to kill(1) should overcome this gap in that the sh(1)ell is forced to check the given identifiers for whether the process identifier has yet been waitpid(2) for or not. In the first case "kill -j -SIG PID" shall fail. I do come here for a reason. Years ago i "saved-away process identifiers", and was under the impression that the sh(1)ell applies special care for those process identifiers until wait(1) has been called on them. But it turned out i could kill(1) a process that was no longer mine, even though i did not have yet wait(1)ed on the process identifier. The shell simply called kill(2) on the process identifier, which in the meantime was reused by the operating system. So maybe this was a sh(1)ell bug, and shells are not allowed to call waitpid(2) on a process where the saved-away process identifier has not been wait(1)ed or (that is, may only call it when the latter is called). If this was so i would retract this issue. If not then the default behaviour of sh(1)ell wait(1) cannot be changed since this could break things in the wild. There should be an option to explicitly request that saved-away process identifiers should be still-alive when kill(1)ing them.

geoffclare 2022-05-17 08:37 manager bugnote:0005840	As I said before, discussing the technical merits of the proposal here is pointless. It will not even be considered for addition to POSIX until it has been implemented in at least one widely used shell. If a kill -j option is needed, the people you need to convince of that are the various shell authors/maintainers, not the Austin Group.

steffen 2022-05-17 14:17 reporter bugnote:0005841	re #5840: So i will try this as i have time to, and leave this issue open, 'hoping to come back for good. (Good would mean it finally could become standardized.) Thank you so far.

Don Cragun 2022-10-31 16:06 manager bugnote:0006018	The timing is such that it will not now be possible for this to be included in Issue 8 draft 3 (which is intended to be feature complete). Therefore this is being rejected for Issue 8 but can be resubmitted for consideration in Issue 9 if and when it is implemented by a widely used shell.

Date Modified	Username	Field	Change
2022-05-14 22:05	steffen	New Issue
2022-05-14 22:05	steffen	Name	=> steffen
2022-05-14 22:05	steffen	Section	=> Vol. 3: Shell and Utilities
2022-05-14 22:05	steffen	Page Number	=> 2879
2022-05-14 22:05	steffen	Line Number	=> 94942
2022-05-16 08:21	geoffclare	Note Added: 0005835
2022-05-16 08:33	geoffclare	Note Edited: 0005835
2022-05-16 10:08	kre	Note Added: 0005836
2022-05-16 13:37	steffen	Note Added: 0005837
2022-05-16 13:54	steffen	Note Added: 0005838
2022-05-16 23:41	steffen	Note Added: 0005839
2022-05-17 08:37	geoffclare	Note Added: 0005840
2022-05-17 14:17	steffen	Note Added: 0005841
2022-10-31 16:06	Don Cragun	Interp Status	=> ---
2022-10-31 16:06	Don Cragun	Note Added: 0006018
2022-10-31 16:06	Don Cragun	Status	New => Closed
2022-10-31 16:06	Don Cragun	Resolution	Open => Rejected

View Issue Details

Activities

Issue History