0001845: Status of the thread-safety of exit is confusing

ID	Project	Category	View Status	Date Submitted	Last Update

0001845	1003.1(2024)/Issue8	System Interfaces	public	2024-07-22 16:38	2024-08-02 14:10

Reporter	ralfjung	Assigned To
Priority	normal	Severity	Editorial	Type	Clarification Requested
Status	New	Resolution	Open

Name	Ralf Jung
Organization
User Reference
Section	exit
Page Number	unknown
Line Number	unknown
Interp Status
Final Accepted Text


Summary	0001845: Status of the thread-safety of exit is confusing
Description	exit is not listed as a thread-unsafe function. It hence seems reasonable to assume that one can call exit concurrently from multiple threads. However, this leads to a SEGFAULT in glibc (https://sourceware.org/bugzilla/show_bug.cgi?id=31997), and the argument of the glibc developers is that their implementation is permitted because the standard says "If a process calls the exit() function more than once, or calls the quick_exit() function in addition to the exit() function, the behavior is undefined." This is taken directly from the C standard. However, this wording has been in the C standard for a long time, even before there were threads, so it was likely meant to rule out calls to exit from within the atexit handlers. The problem with the glibc interpretation of this wording is that it becomes nearly impossible to call "exit" in a concurrent program, since there is no way to to be sure that there is no other thread (managed by some other library) that may call "exit" at the same time. (This is a particular problem for Rust where being sure that things are safe is a key premise of the language, see https://github.com/rust-lang/rust/issues/126600.)
Desired Action	I hope POSIX could clarify that concurrent calls to "exit" are allowed (atexit handlers may run concurrently, which can cause problems, but the concurrent exit calls by themselves cannot be the source of UB).
Tags	No tags attached.

dalias 2024-07-24 19:05 reporter bugnote:0006842	atexit handlers should not be allowed to run concurrently. This is a major semantic change no existing code is written to be prepared for, and is not a reasonable behavior to program to. Fortunately such a behavior is not needed; it's just the current manifestation of the undefined behavior in some implementations, due to unlocking the atexit function list lock between calls in order to handle registration of new atexit handlers by atexit handlers. The reasonable behavior for second and later calls to exit while exit is already in progress is for them to block until the process terminates. This is easily achieved in an implementation by adding a lock at the top of the exit function. However, if the implementation intends to allow atexit handlers themselves to call exit (note: this is also undefined behavior), such a change would make these recursive calls deadlock. Implementations that wish to allow this, however, can do so just by using a recursive lock at the top of exit. Conversely, implementation that wish to disallow it without deadlocking can trap or issue a diagnostic when the recursive lock is taken successfully a second time. The fix I'd like to propose: Change the text from: "If a process calls the exit() function more than once, or calls the quick_exit() function in addition to the exit() function, the behavior is undefined." to "If a function registered by a call to atexit() calls exit(), directly or indirectly, the behavior is undefined." And in the prior paragraph detailing atexit functions, add: "All functions registered by atexit() shall execute in the first thread to call exit(). If any other thread of the process calls exit() before all registered functions have completed or before the process terminates, it shall block until the process terminates."

ralfjung 2024-07-24 19:34 reporter bugnote:0006843	That sounds like a good resolution to me -- it provides strictly more guarantees than I was hoping for. :)

carlos 2024-07-26 12:26 reporter bugnote:0006846	The issue around the concurrency of atexit() handlers is interesting, and not something I'd previously considered. There is a good argument to be made that exit() latency impacts process restart latency and that has an impact on rolling security updates, or functional safety. However, in all of these cases you could have designed the system to be robust enough to allow _Exit(), but that isn't always possible with complex compositions of software. So at a first blush I agree that developers expect today that one thread of execution carries exit() from start to completion with atexit() handlers running as required in the reverse order of registration. Historically glibc developers held the position that when you called exit() you could do everything without locks because the standard said you couldn't call exit() from another thread of execution. That position has evolved considerably over the years, particularly because of the evolution and focus on threading in both language standards and the development community. Today we are all working to make operations self-synchronizing and MT-Safe, but sometimes it's really hard. The exit() case isn't that hard, and I agree that we need to make this MT-Safe under concurrent calls to exit(), anything else is a disservice to developers composing complex applications. My personal position is that exit() from atexit() handlers should fail early and catastrophically, likely with an abort(), since this case is easy to detect (very low detection cost) and the abnormal termination would help the developer debug the scenario. Detecting other forms of UB are much harder. The wrinkle here is that if existing applications rely on such behaviour we may be unable to do this UB detection in this way, and may have to revert it from an implementation perspective, but we could try. The AS-safe case only allows _Exit() or _exit() so that's fine, a terminating signal handler in the current or other threads of execution can just terminate the process skipping the atexit or other handlers. Overall I agree with Rich's position and proposed language.

enh 2024-07-26 12:37 reporter bugnote:0006847	glibc and bionic (Android) have already found cases where there are existing expectations of being able to call exit() again from the exiting thread, so they're both going with the recursive variant atm (glibc: https://sourceware.org/pipermail/libc-alpha/2024-July/158579.html, bionic: https://android-review.googlesource.com/c/platform/bionic/+/3195595). FreeBSD seems likely to go the same direction: https://reviews.freebsd.org/D46108

wakely 2024-07-26 12:39 reporter bugnote:0006848	The proposed new wording drops the part about quick_exit, which is in POSIX now. So maybe: "If a function registered by a call to atexit() calls exit() or quick_exit(), directly or indirectly, the behavior is undefined." And of course a similar change is needed for the quick_exit spec regarding at_quick_exit handlers.

carlos 2024-07-26 20:50 reporter bugnote:0006849	Ralf asked an interesting question in the glibc bug tracker which was "Why does glibc allow calling exit from atexit handlers?" I wanted to note that during my review of Adhemerval Zanella's proposed changes for glibc I noticed that allowing atexit() handlers to call exit() allows them to change the status returned by exit(). This is interesting since otherwise they can't do this and run the rest of the handlers e.g. _exit(). There is existing precedent in glibc that developers expect to be able to call exit() from the registered foreign functions that can be called at exit time. This effectively allows the last run exit handler to choose to override exit status based on some kind of contract between all callers. It is certainly way more useful to allow exit() to be called from multiple threads and have only one succeed. I just wanted to note my reflection on the issue of allowing exit() from atexit().

dalias 2024-07-26 22:28 reporter bugnote:0006850	The functionality Carlos mentions (calling exit from an atexit handler to change exit status to reflect a newly encountered error) is definitely appealing, but the reason I'm hesitant to try to standardize it is that it's hard to define what should happen regarding the remaining atexit handlers and global dtors (the latter of which are outside the scope of this standard). Programs can already do this now by calling _exit/_Exit, but in that case any remaining handlers/dtors do not run, and stdio flush and position sync does not happen. The natural behavior would be that execution of handlers picks up with the next to be called after the one which called exit. This is not hard to implement for atexit handlers, but real-world implementations also deal with global dtors from languages where they are a thing, or as an extension to the C language or platform ABI. The way they are typically implemented, the implementation of the exit function cannot see the boundary between individual dtor calls, and might only see a single atexit-handler-like call for the entire program's dtors, or one per shared library or similar. This means the scope of which, if any, remaining dtors might execute after one calls exit is entirely dependent on implementation details, and thus not suitable for standardization. If there is desire to standardize a behavior for calling exit from an atexit handler, I would urge that we do it only for genuine abstract-machine-level atexit handlers, and leave the case where exit is called from a global dtor or any other context in the exiting thread undefined. (Since such contexts are outside of the scope of this standard anyway, I think from a standards perspective this doesn't require any special action, but it is something implementors need to think about to ensure they don't try to make promises they can't keep.)

carlos 2024-07-29 16:12 reporter bugnote:0006851	I agree with Rich's last comment. I'm not proposing we standardize the behaviour of exit in atexit handlers (or other handlers specified in POSIX) in this specific bug, and while we will allow it in glibc, this is largely a conservative solution to avoid needing to change existing software.

dalias 2024-07-31 15:27 reporter bugnote:0006852	In the proposed text, I also overlooked quick_exit for what happens if multiple threads call exit. There also needs to be some clause to the effect that if both exit and quick_exit are called (from different threads, not recursive case) only one of them shall take effect and the other shall block until process termination.

geoffclare 2024-08-01 16:31 manager bugnote:0006853 Last edited: 2024-08-01 16:32	In the August 1, 2024 teleconference the following changes were agreed. We would like feedback on these before we liaise with the C committee on this issue. On page 627 line 22155 section at_quick_exit(), change: Since the behavior is undefined if the quick_exit() function is called more than once, portable applications calling at_quick_exit() must ensure that the quick_exit() function is not called when the functions registered by the at_quick_exit() function are called. to: Since the behavior is undefined if a function registered by a call to at_quick_exit() calls exit() or quick_exit(), directly or indirectly, portable applications calling at_quick_exit() must ensure that the exit() and quick_exit() functions are not called directly or indirectly by the functions registered by the at_quick_exit() function. On page 638 line 22443 section atexit(), change: Since the behavior is undefined if the exit() function is called more than once, portable applications calling atexit() must ensure that the exit() function is not called when the functions registered by the atexit() function are called. to: Since the behavior is undefined if a function registered by a call to atexit() calls exit() or quick_exit(), directly or indirectly, portable applications calling atexit() must ensure that the exit() and quick_exit() functions are not called directly or indirectly by the functions registered by the atexit() function. On page 880 line 30026 section exit(), change: If a process calls the exit() function more than once, or calls the quick_exit() function in addition to the exit() function, the behavior is undefined. to: [CX]If a function registered by a call to atexit() calls exit() or quick_exit(), directly or indirectly, the behavior is undefined.[/CX] After page 880 line 30045 section exit(), and after page 1945 line 60843 section quick_exit(), add: [CX]If concurrent calls to exit() or quick_exit() are made from different threads, all such calls except the first shall block until process termination. In this case, functions registered by calls to atexit() and at_quick_exit() shall not be invoked by calls that block.[/CX] On page 1845 line 60832 section quick_exit(), change: If a process calls the quick_exit() function more than once, or calls the exit() function in addition to the quick_exit() function, the behavior is undefined. to: [CX]If a function registered by a call to at_quick_exit() calls exit() or quick_exit(), directly or indirectly, the behavior is undefined.[/CX]

carlos 2024-08-02 14:10 reporter bugnote:0006854	In general a C library today may have to consider the following: * Coordination with a C++ runtime for the registration of C++ object destruction during library unloading or exit * Concurrent dlclose(), exit(), _exit(), _Exit(), abort(), quick_exit(), and pthread_exit() * Asynchronous signal safety of abort() * Implementation of atexit() and exit() * Implementation of at_quick_exit() and quick_exit() * Implementation of legacy on_exit() * Arrival of implementation dependent signals e.g. SIGCANCEL, SIGSETXID; or application controlled signals during library unloading or shutdown * Concurrent multi-threaded fork() with atfork() registration and deregistration via dlclose() I mention all of these for completeness as I hold them in mind while reviewing the proposed text. I am consciously limiting the scope of my review to process and thread exit. The suggested changes cover the cases for atexit(), exit(), at_quick_exit(), and quick_exit(). Wording for _exit() and _Exit() which may also be called concurrently are left untouched. As expected this means that it is well defined to call _exit() or _Exit() from any registered shutdown handlers. The proposed text looks good to me. The interesting remaining case is dlclose() for registered exit handlers, fork handlers or other language-specific handlers. I think the easiest answer is to say that dlclose(), exit() and quick_exit() are all multi-threaded safe (this is what the standard says today), and so the implementation must ensure that calling them in any order from any thread is allowed. With my developer hat on I'd expect some happens-before ordering between dlclose() and the other exit functions, either the dlclose() happens first or doesn't happen at all. The trivial implementation is that dlclose() takes the same recursive exit lock, which allows registered exit handlers to call dlclose() themselves (allowed) but any other thread would be blocked from calling exit() or quick_exit() until the dlclose() completes. This means that dlclose() has a shutdown-latency penalty, but allows plugin frameworks to operate concurrently to process exit without ordering issues around deregistration of exit handlers, fork handlers, and language-specific handlers. In summary, I don't think anything should be said about dlclose() and exit() or quick_exit(), letting the multi-threaded safety stand as the requirement for correctness.

Date Modified	Username	Field	Change
2024-07-22 16:38	ralfjung	New Issue
2024-07-22 16:38	ralfjung	Name	=> Ralf Jung
2024-07-22 16:38	ralfjung	Section	=> exit
2024-07-22 16:38	ralfjung	Page Number	=> unknown
2024-07-22 16:38	ralfjung	Line Number	=> unknown
2024-07-24 19:05	dalias	Note Added: 0006842
2024-07-24 19:34	ralfjung	Note Added: 0006843
2024-07-26 12:26	carlos	Note Added: 0006846
2024-07-26 12:37	enh	Note Added: 0006847
2024-07-26 12:39	wakely	Note Added: 0006848
2024-07-26 20:50	carlos	Note Added: 0006849
2024-07-26 22:28	dalias	Note Added: 0006850
2024-07-29 16:12	carlos	Note Added: 0006851
2024-07-31 15:27	dalias	Note Added: 0006852
2024-08-01 16:31	geoffclare	Note Added: 0006853
2024-08-01 16:32	geoffclare	Note Edited: 0006853
2024-08-02 14:10	carlos	Note Added: 0006854

View Issue Details

Activities

Issue History