Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001112 [1003.1(2016/18)/Issue7+TC2] Base Definitions and Headers Comment Clarification Requested 2017-01-05 20:03 2024-06-11 09:09
Reporter torvald View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Torvald Riegel
Organization Red Hat
User Reference
Section fork
Page Number 898
Line Number 30327
Interp Status ---
Final Accepted Text Note: 0004047
Summary 0001112: mutex/rwlock ownership after fork is unclear
Description I am not aware of an explicit specification of which thread or process is the owner of a lock that was in an acquired state when fork() is called. Some wording seems to suggest that this is undefined behavior; other wording seems to suggest that at least some operations should be safe on such a lock.

The most obvious example for why this matters is process-shared robust mutexes. But it also has implications, including regarding implementation efficiency, for non-robust and process-private mutexes and rwlocks (I'll just call both locks in what follows).

If a mutex is process-shared, it should not have two owners after fork() (ie, both the parent and the child process) because this is against the whole principle of exclusive-ownership mutexes. Similar problems arise for rwlocks. It would also be hard to implement for the error-checking and PI mutex kinds. Thus, there should be just one owner, and there is no reason to prefer the child over the parent.

That leaves non-process-shared locks and locks that are of the process-shared kind but are not actually shared. However, I think we should discard the latter distinction because it's too hard for implementations to efficiently track which locks are actually shared and which aren't; a process-shared-kind lock should just be assumed to
be process-shared.

Process-private locks in an acquired state could more easily be "duplicated" (compare the "replica" wording in the spec) because they are separate objects in the parent and child process. However, this would affect implementations of recursive, error-checking, robust, and PI mutexes and potentially rwlocks because these likely rely on OS-determined thread IDs (TID) or process IDs to determine ownership. What this means is certainly implementation-dependent, but rewriting owner TID fields in all locks in either the child or the parent process would probably be the most practical solution -- which still would require maintaining a list of all of acquired locks in a process, which is hardly practical. Thus, I do not see a lot of incentive for treating process-private locks different than process-shared ones.


I'm not aware of explicit wording in the specification regarding what should happen to lock owners when fork() is called. It is stated that a mutex is owned by the thread that acquires it (http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_229), [^] which would align with specifying that the parent process remains the owner of mutexes.
The fork() spec (http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html) [^] states that file locks are not inherited to the child process, which aligns too with not letting child processes be owners of mutexes and rwlocks. There is also a statement that for multi-threaded programs, the child is a "replica" of the parent, "possibly including the states of mutexes", and that this may mean the child needs to only call signal-safe functions.

The rationale for fork() states that fork() is only used to either create (something like) a new thread or to call exec(). Both align well with letting only the parent be the owner of a lock, or to make accessing the locks in the child undefined behavior. It also states: "When a programmer is writing a multi-threaded program, the first described use of fork(), creating new threads in the same program, is provided by the pthread_create() function. The fork() function is thus used only to run new programs, and the effects of calling functions that require certain resources between the call to fork() and the call to an exec function are undefined." Even though this ignores the possibility of acquiring locks in a single-threaded program, it states that requiring resources (eg, attempting to acquire a
lock) between fork() and exec() is undefined.
It also explains that a forkall() idea was rejected that would have "allow[ed] locks and the state to be preserved without explicit pthread_atfork() code"; this also seems like an indication that the intend was to not allow accesses to the locks in the child.
Desired Action Clarify the specification. I think the most practical choice would be to add one of these three requirements:

(R1) Any interaction of the child process with mutexes or rwlocks that are in an
acquired state when fork() was called in the parent results in undefined behavior.

(R2) Like R1, but reinitialization of the mutex or rwlock is allowed for process-private mutexes and rwlocks.

(R3) Any mutexes or rwlocks that are in an acquired state when fork() was called in
the parent remain to be locked by the parent process.

(As an alternative to undefined behavior, the behavior could also be required to be implementation-defined.)


R1 makes no additional requirements on implementations. Some current applications seem to expect more guarantees from an implementation, but I am not aware of any promise regarding that by the specification. It also still allows implementations to specify stronger guarantees.

R2 is a bit stricter on implementations but explicitly allows for the reinitialization use case (ie, fork, then reinit the lock in the child). Reinitialization may be the only thing that a child actually needs, because there is no other thread in the child process, so it the child can construct the state it wants. It requires implementations to reset any global state potentially associated with acquired locks.

Both R1 and R2 do not require changes in the parent. In particular, the parent can remain to be the owner of process-shared locks that were in an acquired state when fork() was called.

R3 is more explicit, but it comes with additional requirements for implementations. This may be surprising, but consider implementations that use TIDs to represent ownership: If the parent releases the lock and exits, the child will have a lock acquired by a process that doesn't exist anymore (and there's nothing the parent can do about it for a process-private lock). If TIDs get reused, for example for new threads created by the child process, there are ABA situations.

Therefore, I would prefer R1, followed by R2.
Tags issue8
Attached Files

- Relationships
related to 0000062Closedajosey 1003.1(2008)/Issue 7 Is it correct to list fork as an async-signal safe interface 
related to 0001118Closed 1003.1(2016/18)/Issue7+TC2 Clarify meaning of "file lock" 

-  Notes
(0003537)
geoffclare (manager)
2017-01-06 09:55
edited on: 2017-01-06 09:57

R1 is what is currently intended, although I agree it is not fully clear from the wording in the fork() description (because it incorrectly uses "may").

It is clearer in the APPLICATION USAGE and RATIONALE for pthread_atfork(), but these are non-normative:
The original usage pattern envisaged for pthread_atfork() was for the prepare fork handler to lock mutexes and other locks, and for the parent and child handlers to unlock them. However, since all of the relevant unlocking functions, except sem_post(), are not async-signal-safe, this usage results in undefined behavior in the child process unless the only such unlocking function it calls is sem_post().

[...]

As explained, there is no suitable solution for functionality which requires non-atomic operations to be protected through mutexes and locks. This is why the POSIX.1 standard since the 1996 release requires that the child process after fork() in a multi-threaded process only calls async-signal-safe interfaces.


Another problem with the current wording is that it has a loophole in that it only talks about an exec function being called; it doesn't say the call has to be successful. A conforming application could call execl("/", "/", (char *)0) to satisfy this condition and then would no longer be restricted to async-signal-safe operations.

I suggest the following...

On page 898 line 30327 section fork() change:
Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

to:
Consequently, the application shall ensure that the child process only executes async-signal-safe operations until such time as one of the exec functions is successful.


(0003539)
Florian Weimer (reporter)
2017-01-07 09:09

This issue also affects non-multi-threaded processes which use process-shared mutexes, where the restrictions after fork do not seem to apply.
(0004047)
geoffclare (manager)
2018-06-28 16:15
edited on: 2018-06-28 16:16

On page 898 line 30327 section fork() change:

Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.
to:
Consequently, the application shall ensure that the child process only executes async-signal-safe operations until such time as one of the exec functions is successful.

After line 30332 add a new bullet item:
* Any locks held by any thread in the calling process that have been set to be process-shared shall not be held by the child process. For locks held by any thread in the calling process that have not been set to be process-shared, any attempt by the child process to perform any operation on the lock results in undefined behavior (regardless of whether the calling process is single-threaded or multi-threaded).



- Issue History
Date Modified Username Field Change
2017-01-05 20:03 torvald New Issue
2017-01-05 20:03 torvald Name => Torvald Riegel
2017-01-05 20:03 torvald Organization => Red Hat
2017-01-05 20:03 torvald Section => (section number or name, can be interface name)
2017-01-05 20:03 torvald Page Number => (page or range of pages)
2017-01-05 20:03 torvald Line Number => (Line or range of lines)
2017-01-06 09:55 geoffclare Note Added: 0003537
2017-01-06 09:55 geoffclare Project 1003.1(2013)/Issue7+TC1 => 1003.1(2016/18)/Issue7+TC2
2017-01-06 09:57 geoffclare Note Edited: 0003537
2017-01-06 09:59 geoffclare Section (section number or name, can be interface name) => fork
2017-01-06 09:59 geoffclare Page Number (page or range of pages) => 898
2017-01-06 09:59 geoffclare Line Number (Line or range of lines) => 30327
2017-01-06 09:59 geoffclare Interp Status => ---
2017-01-07 09:06 Florian Weimer Issue Monitored: Florian Weimer
2017-01-07 09:09 Florian Weimer Note Added: 0003539
2018-06-28 16:15 geoffclare Note Added: 0004047
2018-06-28 16:16 geoffclare Note Edited: 0004047
2018-06-28 16:17 geoffclare Final Accepted Text => Note: 0004047
2018-06-28 16:17 geoffclare Status New => Resolved
2018-06-28 16:17 geoffclare Resolution Open => Accepted As Marked
2018-06-28 16:17 geoffclare Tag Attached: issue8
2018-07-26 15:11 nick Relationship added related to 0001118
2020-04-23 11:18 geoffclare Status Resolved => Applied
2023-02-23 15:36 eblake Relationship added related to 0000062
2024-06-11 09:09 agadmin Status Applied => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker