Anonymous | Login | 2023-11-29 18:41 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001585 | [1003.1(2016/18)/Issue7+TC2] Shell and Utilities | Editorial | Enhancement Request | 2022-05-14 22:05 | 2022-10-31 16:06 | ||
Reporter | steffen | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Rejected | ||||
Status | Closed | ||||||
Name | steffen | ||||||
Organization | |||||||
User Reference | |||||||
Section | Vol. 3: Shell and Utilities | ||||||
Page Number | 2879 | ||||||
Line Number | 94942 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | |||||||
Summary | 0001585: kill - add -j option to avoid PID reuse race | ||||||
Description |
With today's high-speed multi-core machines and fine-grain locked operating systems process identifier (PID) reuse may occur very fast. Let me quote a message of mine (20220301174917.eoVFB%steffen@sdaoden.eu): | |NetBSD does guarantee not to reuse a pid for a reasonable number | |of forks after a process exits. | |...which might be fruitless with 16-bit pids, define "reasonable". |Matt Dillon of DragonFly BSD (crond etc.) made, after implementing |some DBSD kernel optimizations (iirc), tests with statically |linked programs and... quoting myself | | i remember Matthew Dillon's post on DragonFly BSD users@[1], where | he claims 450000 execs per second for a statically linked binary, | and about 45000 execs per second for a dynamic one, with DragonFly | 5.6 on a threadripper. | | [1] https://marc.info/?l=dragonfly-users&m=155846667020624&w=2 [^] So statically linked programs could consume a 16-bit process identifier range seven (7) times per second, which exceeds even the Earth Overshoot day of countries like U.S.A, Australia, Finland or Sweden by almost a factor of two, or China by 3.33. Today it is impossible to savely kill(1) child processes, because even in code like JOBS=number of jobs JOBMON=non-empty if the sh(1)ell supports set -m # (then childs are started under set -m as process groups) jtimeout() { i=0 while [ ${i} -lt ${JOBS} ]; do i=`add ${i} 1` if [ -f t.${i}.id ] && ^ The child process removes this file when it exits regulary. read pid < t.${i}.id >/dev/null 2>&1 && ^ The file contains the identifier of the process (group). kill -0 ${pid} >/dev/null 2>&1; then ^ We test whether it really is alive. j=${pid} [ -n "${JOBMON}" ] && j=-${j} kill -KILL ${j} >/dev/null 2>&1 ^ If so, we kill it very hard, as it exceeded a timeout (which in this case means something is totally wrong). else ${rm} -f t.${i}.id ^ (Maybe it died badly, and could not cleanup.) fi done } there still is a race in between the "kill -0" that tests process existance, and the "kill -KILL" that aborts it. The latter kill(1) could kill the wrong process. Of course one could write code where the timeout is say 30 seconds, and the parent sh(1)ell creates a date(1) stamp before it starts the child. Then the shell could say wait 35 seconds and then simply kill the child if that is still alive via kill -0, as that would not terminate itself if more than 30 seconds have passed since it was started, but wait for being killed in that case. But this seems a strange approach (and when nitpicking can be subject to clock jumps). |
||||||
Desired Action |
Add, on page 2879, line 94942: -j JOB Process the kill request only when the given JOB number is known to the shell, and the JOB has not yet terminated, otherwise exit with status 66 (EX_NOINPUT from sysexits.h). Since the POSIX entry for kill(1) mentions two times that the "job control job ID notation is not required to work as expected when kill is operating in its own utility execution environment" i think no further addition is needed. |
||||||
Tags | No tags attached. | ||||||
Attached Files | |||||||
|
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |