Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001585 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Enhancement Request 2022-05-14 22:05 2022-05-17 14:17
Reporter steffen View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name steffen
Organization
User Reference
Section Vol. 3: Shell and Utilities
Page Number 2879
Line Number 94942
Interp Status ---
Final Accepted Text
Summary 0001585: kill - add -j option to avoid PID reuse race
Description With today's high-speed multi-core machines and fine-grain locked
operating systems process identifier (PID) reuse may occur very
fast. Let me quote a message of mine
(20220301174917.eoVFB%steffen@sdaoden.eu):

 | |NetBSD does guarantee not to reuse a pid for a reasonable number
 | |of forks after a process exits.
 |
 |...which might be fruitless with 16-bit pids, define "reasonable".
 |Matt Dillon of DragonFly BSD (crond etc.) made, after implementing
 |some DBSD kernel optimizations (iirc), tests with statically
 |linked programs and... quoting myself
 |
 | i remember Matthew Dillon's post on DragonFly BSD users@[1], where
 | he claims 450000 execs per second for a statically linked binary,
 | and about 45000 execs per second for a dynamic one, with DragonFly
 | 5.6 on a threadripper.
 |
 | [1] https://marc.info/?l=dragonfly-users&m=155846667020624&w=2 [^]

So statically linked programs could consume a 16-bit process
identifier range seven (7) times per second, which exceeds even
the Earth Overshoot day of countries like U.S.A, Australia,
Finland or Sweden by almost a factor of two, or China by 3.33.

Today it is impossible to savely kill(1) child processes, because
even in code like

  JOBS=number of jobs
  JOBMON=non-empty if the sh(1)ell supports set -m
  # (then childs are started under set -m as process groups)

  jtimeout() {
     i=0
     while [ ${i} -lt ${JOBS} ]; do
        i=`add ${i} 1`
        if [ -f t.${i}.id ] &&

^ The child process removes this file when it exits regulary.

              read pid < t.${i}.id >/dev/null 2>&1 &&

^ The file contains the identifier of the process (group).

              kill -0 ${pid} >/dev/null 2>&1; then

^ We test whether it really is alive.

           j=${pid}
           [ -n "${JOBMON}" ] && j=-${j}
           kill -KILL ${j} >/dev/null 2>&1

^ If so, we kill it very hard, as it exceeded a timeout (which in
this case means something is totally wrong).
        else
           ${rm} -f t.${i}.id

^ (Maybe it died badly, and could not cleanup.)

        fi
     done
  }

there still is a race in between the "kill -0" that tests process
existance, and the "kill -KILL" that aborts it.

The latter kill(1) could kill the wrong process.

Of course one could write code where the timeout is say 30
seconds, and the parent sh(1)ell creates a date(1) stamp before it
starts the child. Then the shell could say wait 35 seconds and
then simply kill the child if that is still alive via kill -0, as
that would not terminate itself if more than 30 seconds have
passed since it was started, but wait for being killed in that
case. But this seems a strange approach (and when nitpicking can
be subject to clock jumps).
Desired Action Add, on page 2879, line 94942:

  -j JOB
    Process the kill request only when the given JOB number is
    known to the shell, and the JOB has not yet terminated,
    otherwise exit with status 66 (EX_NOINPUT from sysexits.h).

Since the POSIX entry for kill(1) mentions two times that the "job
control job ID notation is not required to work as expected when
kill is operating in its own utility execution environment"
i think no further addition is needed.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0005835)
geoffclare (manager)
2022-05-16 08:21
edited on: 2022-05-16 08:33

As far as I can see, the only time this -j option would be useful is if an application wants to send a signal just to a process group leader without sending it to the other processes in the group. This seems like a very rare thing for an application to need to do.

It would not be useful in the other two possible cases, which are:

1. The signal is to be sent to the whole process group. In this case, the application can just use "kill JOB".

2. The signal is to be sent to one or more of the processes comprising a process group, but not the whole group and not just the leader. In this case the use of -j does not solve the problem, as any of those process IDs (other than the leader) could have been reused even though the leader is still running and thus the job still exists.

Anyway, a discussion of the technical merits of the proposal is pointless unless there is a shell which already implements this kill -j option. None of the shells I have available do. Does anybody know of one that does? If not, this request should be rejected as invention.

(0005836)
kre (reporter)
2022-05-16 10:08

I agree that this is invention, and should be rejected, but even if
some shell did try implementing it, I cannot see how they would do so
in a way that would meet the objectives of the issue raised.

As best I can tell, doing anything as suggested requires kernel assistance,
as no matter how carefully the shell checks, there's no way that it can avoid
race conditions, it must check first, and then do the kill sys call, and in
the intervening period, things might have altered.

If the kernel had a kill_my_child() sts call, then it could be made to work
I think, as the shell could check the pids it knows belong to its children, and
avoid creating any new ones between that check and doing the kill_my_child()
sys call, but since I don't know of any system that implements a sys call
like that (orr an option on kill - it cold be done by setting a high order
bit in the signal number - then I cannot see how a shell could possibly make
this work well enough to make adding such an option sensible.

kre

ps": such a new sys call would work on pgrps just the same as kill() does.
(0005837)
steffen (reporter)
2022-05-16 13:37

First of all: correction: what i really meant was

  -j
    Process the kill request only when the given jobs or saved-aways process identifiers are known to the shell, and have not yet terminated, otherwise exit with status 66 (EX_NOINPUT from sysexits.h).
(0005838)
steffen (reporter)
2022-05-16 13:54

re 5835 and 5836:

It is clear the idea is that the shell's children will remain in the operating system's table of active processes until they have been wait(2)ed for, therefore only the sh(1)ell as the parent process can kill(2) the child safely.
-j is thus meant to give the sh(1)ell script writer access to the race-free capability that the sh(1)ell as such has in its internals (anyway).

Regarding process groups. Yes, this is true, of course, but i think it is weak reasoning to not offer the hand because somewhere down the process chain such things may happen. Quite the opposite, if i have the possibility everywhere, i can write race-free sh(1)ell scripts on all (subshell) levels.

And programs with direct access to wait(2) that start child processes are hopefully doing it right anyway. They at least could using POSIX interfaces.
But sh(1)ell script can not, even though the sh(1)ell as such can, or even has to do right.
This is what this issue wants to change.

Regarding operating system support. Oh, that is true!!
They do start implementing this but unfortunately non-portable, POSIX is late and should possibly have tried to set a scent here in the past.
Linux has the prctl(2)s PR_SET_CHILD_SUBREAPER and PR_GET_CHILD_SUBREAPER

              A subreaper fulfills the role of init(1) for its descendant pro‐
              cesses. When a process becomes orphaned (i.e., its immediate
              parent terminates), then that process will be reparented to the
              nearest still living ancestor subreaper. Subsequently, calls to
              getppid(2) in the orphaned process will now return the PID of
              the subreaper process, and when the orphan terminates, it is the
              subreaper process that will receive a SIGCHLD signal and will be
              able to wait(2) on the process to discover its termination sta‐
              tus.

and FreeBSD has an even more sophisticated approach that allows iterating over the "descendants of the reaper", especially as with PROC_REAP_KILL the possibility to kill only a subset of these.

It could be that in order to implement timeout(1) properly portions of this functionality need to be implemented kernel-wise.
Or already have done so.
(0005839)
steffen (reporter)
2022-05-16 23:41

Maybe i do misunderstand. Then i would retract the issue.

What this issue wants to achieve is to close the gap in between
wait(1) and waitpid(2)/x(2).

- A saved-away process identifier will be known to the shell
  unless wait(1) has been called on it.

- The process itself is known to the operating system aka kept
  in the process table until it has been waitpid(2)ed for.

If i kill(1) a process that is still known to the sh(1)ell
because wait(1) has not yet been called, but the shell itself
has already waitpid(2)ed on the child, after having received
SIGCLD or for whatever reason, then the operating system
may already have reused the process identifier as such.

The -j option to kill(1) should overcome this gap in that the
sh(1)ell is forced to check the given identifiers for whether
the process identifier has yet been waitpid(2) for or not.
In the first case "kill -j -SIG PID" shall fail.


I do come here for a reason.

Years ago i "saved-away process identifiers", and was under the
impression that the sh(1)ell applies special care for those
process identifiers until wait(1) has been called on them.
But it turned out i could kill(1) a process that was no longer
mine, even though i did not have yet wait(1)ed on the process
identifier.
The shell simply called kill(2) on the process identifier, which
in the meantime was reused by the operating system.

So maybe this was a sh(1)ell bug, and shells are not allowed
to call waitpid(2) on a process where the saved-away process
identifier has not been wait(1)ed or (that is, may only call it
when the latter is called).

If this was so i would retract this issue.

If not then the default behaviour of sh(1)ell wait(1) cannot
be changed since this could break things in the wild.

There should be an option to explicitly request that saved-away
process identifiers should be still-alive when kill(1)ing them.
(0005840)
geoffclare (manager)
2022-05-17 08:37

As I said before, discussing the technical merits of the proposal here is pointless. It will not even be considered for addition to POSIX until it has been implemented in at least one widely used shell.

If a kill -j option is needed, the people you need to convince of that are the various shell authors/maintainers, not the Austin Group.
(0005841)
steffen (reporter)
2022-05-17 14:17

re #5840:

So i will try this as i have time to, and leave this issue open, 'hoping to come back for good. (Good would mean it finally could become standardized.)

Thank you so far.

- Issue History
Date Modified Username Field Change
2022-05-14 22:05 steffen New Issue
2022-05-14 22:05 steffen Name => steffen
2022-05-14 22:05 steffen Section => Vol. 3: Shell and Utilities
2022-05-14 22:05 steffen Page Number => 2879
2022-05-14 22:05 steffen Line Number => 94942
2022-05-16 08:21 geoffclare Note Added: 0005835
2022-05-16 08:33 geoffclare Note Edited: 0005835
2022-05-16 10:08 kre Note Added: 0005836
2022-05-16 13:37 steffen Note Added: 0005837
2022-05-16 13:54 steffen Note Added: 0005838
2022-05-16 23:41 steffen Note Added: 0005839
2022-05-17 08:37 geoffclare Note Added: 0005840
2022-05-17 14:17 steffen Note Added: 0005841


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker