0001111: pthread_rwlock_rdlock requirements if TPS is supported are under-specified, error-prone for users, and inefficient

ID	Project	Category	View Status	Date Submitted	Last Update

0001111	1003.1(2013)/Issue7+TC1	Base Definitions and Headers	public	2017-01-04 23:06	2024-06-11 08:54

Reporter	torvald	Assigned To
Priority	normal	Severity	Objection	Type	Clarification Requested
Status	Closed	Resolution	Accepted As Marked

Name	Torvald Riegel
Organization	Red Hat
User Reference
Section	pthread_rwlock_rdlock/pthread_rwlock_tryrdlock (XSH, 3 System Interfaces)
Page Number	(page or range of pages)
Line Number	(Line or range of lines)
Interp Status	---
Final Accepted Text	0001111:0004042


Summary	0001111: pthread_rwlock_rdlock requirements if TPS is supported are under-specified, error-prone for users, and inefficient
Description	The spec states: If the Thread Execution Scheduling option is supported, and the threads involved in the lock are executing with the scheduling policies SCHED_FIFO or SCHED_RR, the calling thread shall not acquire the lock if a writer holds the lock or if writers of higher or equal priority are blocked on the lock; otherwise, the calling thread shall acquire the lock. (1) What does "threads involved in the lock" mean? All threads that ever acquired the lock? All that are currently trying to acquire or have acquired the lock? (2) Recursive rdlock is allowed (also see 0000720). However, the TPS requirements contradict allowing recursive rdlocks, at least if one doesn't want to make user programs prone to deadlocks. This is because they introduce a reader preference that does not consider whether a thread acquires a read lock recursively. (3) Implementing the TPS requirements brings in constraints that would decrease performance a lot. First, the implementation needs to at least count "threads involved in the lock". Second, the implementation needs to keep a prio list of all writers attempting to acquire the lock; which means O(N) space requirements with N being either the possible priorities or number of threads. The space-requirement is a problem for at least process-shared rwlocks, and would likely mean that all synchronization has to go through the kernel. All the tracking of threads and priorities means additional contention, and thus lower performance (at least in the presence of writers). This wouldn't be that bad if it would be some odd special mode; but if an implementation supports TPS (eg, for other reasons than to mess with rwlocks), my interpretation of the requirement is that the TPS requirements become the default semantics. This is bad. Acquiring the lock as a reader should be able to scale well, and it is unlikely that the TPS requirements would make that possible,
Desired Action	My preference would be to simply remove the TPS requirements; the reader/writer preference would then be implementation-defined, which would make no existing implementations incorrect. If you do not want to do that, my second suggestion would be to make the TPS semantics a special mode of rwlocks (not the default mode though); one could standardize glibc's reader/writer preference modes at the same time, which would make it easier for users to select the semantics they want in a standard-conforming way.
Tags	tc3-2008

bhaible 2017-01-05 02:13 reporter bugnote:0003528	As an application programmer, I don't want my application to hang due to writer starvation, because that means disruption, data loss, user frustration. The conditions under which writer starvation occurs (with reader-preferring rwlocks) are not under my control, namely the number of reader threads and the percentage of time the readers spend with the lock held. If the product of these two figures go too high, writer starvation occurs. The only known reliable way to avoid this are the writer-preferring rwlocks. So if POSIX does not offer a way to force the use of writer-preferring rwlocks, I don't want to use rwlocks altogether, I'll use other synchronisation mechanisms. The intent of the stated sentence [ignoring priorities] "if writers are waiting on the lock, pthread_rwlock_rdlock callers must wait" is that it avoids writer starvation by not allowing the total duration of the readers-taken period to be extended. However, it is nonsense to impose such a rule for recursive re-taking of the lock, i.e. for pthread_rwlock_rdlock invocations in threads which have the lock already taken as readers: Such a "recursive" or "inner" use of the lock will be terminated before the "outermost" use of the lock by the same thread. The total duration of the readers-taken period can never be extended through such recursive pthread_rwlock_rdlock invocations. My proposed desired action is therefore: Replace the sentence "... the calling thread shall not acquire the lock if a writer holds the lock or if writers of higher or equal priority are blocked on the lock; otherwise, the calling thread shall acquire the lock." with "... the calling thread shall acquire the lock if and only if (a) the calling thread is already holding this lock as a read lock, or (b) no writer holds the lock and no writers of higher or equal priority are blocked on the lock." With this formulation (and assuming [TPS] and the stated thread policies), there is a guarantee that writer starvation is avoided. AND there is no contradiction with recursive invocations of pthread_rwlock_rdlock any more.

torvald 2017-01-05 12:00 reporter bugnote:0003534	Replying to Note 0003528: First, the same case can be made for reader starvation by writers. The preference you state is not universally true. Second, even exclusive mutexes are not fair, so this is not a rwlock-specific problem; sem_wait is not fair either. The purpose of the reader/writer distinction in reader-writer locks is not about being able to prioritize threads, but about allowing higher throughput / parallelism by designating threads that do not need mutual exclusion among them (ie, the readers). Third, non-fair mutual exclusion is not a problem in a lot of use cases, notably all use cases in which either only throughput matters or the amount of work is bounded (eg, all synchronization in fork/join parallelism, allocators, ...). If a user wants something like priorities, fairness, or starvation-freedom, then this is not about mutual exclusion. From a synchronization design perspective, locks are a mutual exclusion mechanism; everything else is an add-on feature. Therefore, users should use locks for mutual exclusion; it does not make sense to claim that locks that provide mutual exclusion are useless just because they don't also provide some other random feature in the same package. If users want priorities or have other constraints on ordering, they can do so using suitable chosen conditions with a condition variable, using concurrent task queues, etc.; they still use locks and rwlocks for mutual exclusion though.b Furthermore, there are many different forms of priorities and fairness. You seem to want strict preferences/priorities. This is incomparable (in terms of allowed executions) to starvation-freedom, for example; fairness is a subset of starvation-freedom. There is no one-size-fits-all mode here. You seem to want priorities, but I believe what you actually want is starvation-freedom for both readers and writers (meaning no thread waits forever if it could acquire the lock legally, though it could wait for a long time theoretically). Also, specifying forward progress requirements precisely requires care, and POSIX would have to catch up significantly to be able to do that. (I'm speaking from experience here: I have worked on improved forward progress requirements for C++ in the past years, which have been voted into the C++17 draft recently.) Finally, performance of synchronization constructs matters a lot. Synchronization used to make parallelism possible (and that's the vast majority of synchronization) must exploit as much parallelism as possible; introducing runtime overheads often increases the time at which there is no parallelism but just sequential execution effectively. Next, remember Amdahl's law. Having to track which thread has already acquired a lock introduces overhead. For programs that use not just one rwlock, this overhead can be substantial. Having to track which writers are blocked, and which exact priorities they have, is more overhead and requires more synchronization with the other threads attempting to acquire the rwlock. Simplified, more synchronization means more contention and more sequential execution. I believe that most users want their synchronization constructs to be efficient and scale well. Speaking for glibc, and as glibc's concurrency maintainer, I do not intend to slow down synchronization for all users just to support niche use cases, or use cases that do not align with the overall design of the various synchronization constructs and how they are supposed to work together and be used. Introducing unnecessary overheads does not serve the majority of glibc's users. And I believe that it would also not serve the majority of pthreads' users.

bhaible 2017-01-05 12:15 reporter bugnote:0003535	Current situation on different OSes: - rwlocks are by default writer-preferring on Solaris (at least 9 and newer) and Mac OS X (at least 10.5 and newer). - rwlocks are by default reader-preferring on glibc.

shware_systems 2017-01-05 15:56 reporter bugnote:0003536	To me it looks like the current wording supports that before an exclusive read lock is granted, all potential writers to the area to be read of equal or higher thread priority will have completed any modifications to that area so the read view is temporally consistent to that extent. What modifications lower priority threads might make is assumed are more of a bookkeeping recording nature, and not the primary data of interest that needs the synchronization. The expected practice I see is the rdlock, once granted, shall only be held long enough to take a snapshot of the area locally and then release it so lower priority writers can unblock. Where multiple threads need to make copies of the same snapshot, this is where recursive locking is useful, to speak to point (2), but the application should be unlocking the data once all copies have been made - not keep the lock open so caching the snapshot isn't necessary. Any writes by the threads would be to this cached data, for passing to a writer thread as an update source for the common area. Such copying can be a significant overhead issue, but the primary concern is data consistency at point of use, not execution time. The *_r() variants of interfaces have similar overhead issues of buffer copying, but the copies are valid snapshots for use. If an application user doesn't want to do a lot of waiting, it's on them to request smaller data sets. To answer (1) and (3), I read it as the pool of threads currently blocking on acquiring the rd lock or holding as their current owner an outstanding rd or wr lock on the same memory area, including overlaps. I do not see it as requiring a history of threads which have held a lock before on the area and might try to reacquire it. This limits the O[N] space requirements to the actual working set of significance to a thread wanting the rdlock. An implementation may provide APIs as extensions that ease an application doing such history management for itself, where warranted, but for the general case the application should be treating the lockable area as the C compiler treats a volatile atomic object and not expect any tracking done by the implementation. To address starvation, which I agree is a potential issue with either sort of preference, I can see that both rd and wr locks could have an application specifiable timeout associated with them, so the system as a whole isn't waiting indefinitely on a blocked I/O operation completing. This would require some capability of rolling back modification attempts that partially complete, so isn't trivial, but could be a good adjunct to current usage. Basic scenario I see is if an operation that warrants a lock completes before the timeout it unlocks things normally, otherwise the operation gets rolled back and optionally re-queued for another try but allows other blocked threads to attempt their operations. Read attempts that timeout would simply indicate the timeout occurred and it would be up to the application to handle the data it was cacheing was not entirely readable as an abort, retry, fail situation.

praiskup 2017-01-06 13:34 reporter bugnote:0003538	At least for me, the TPS is not what causes confusion - but rather the second sentence: """ DESCRIPTION The pthread_rwlock_rdlock() function shall apply a read lock to the read-write lock referenced by rwlock. The calling thread acquires the read lock if a writer does not hold the lock and there are no writers blocked on the lock. """ Possibly the ending "and there are no writers blocked on the lock." could be dropped and then the specs would describe what glibc really does by default.

torvald 2017-01-10 11:48 reporter bugnote:0003540	Regarding note 0003538: I agree that the second sentence can be confusing perhaps, but I think it is technically correct: If both no writer holds the lock and there are no writers blocked on the lock, then the reader acquires. This is not and "if and only if" but that's perhaps easy to overlook. To make this clearer, one could add "; if writers are blocked on the lock, see below" so that it becomes obvious that the other cases are discussed further down.

geoffclare 2018-06-14 16:15 reporter bugnote:0004042 Last edited: 2018-06-14 16:16	On P1707 L55544 and L55548 change: the threads involved in the lock to: the threads that hold or are blocked on the lock On P1707 L55546 and L55550 change: or if writers of higher or equal priority are blocked on the lock to: or if the calling thread does not already hold a read lock and writers of higher or equal priority are blocked on the lock On P1707 L55555 insert a paragraph break before: If the read lock is not acquired, and change it to: Whether the Thread Execution Scheduling option is supported or not, if the read lock is not acquired,

eblake 2019-08-30 00:41 manager bugnote:0004545	https://sourceware.org/bugzilla/show_bug.cgi?id=13701 complains that there is still an inherent performance penalty in the resulting wording on scheduling fairness, and that glibc's implementation has chosen to disobey any requirements on fair scheduling because of the overhead required. https://lists.gnu.org/archive/html/bug-gnulib/2019-08/msg00089.html shows that gnulib is acknowledging that glibc's default behavior will never be fair enough under TPS scheduling; and serves as an example of a program able to sniff out the difference according to the current wording of the standard. We may need to reopen this or create a related bug, deciding whether to further relax things to permit glibc's behavior and warn implementations about workarounds when fairness is important but cannot be guaranteed by libc alone.

eblake 2019-11-07 16:59 manager bugnote:0004639	This was discussed again on the 2019-11-07 call, in response to 0001111:0004545. Since TPS is optional, the consensus on the call was that Linux has a bug for advertising TPS when it cannot actually support it. One recommendation would be for Linux/glibc to fix their headers to define _POSIX_THREAD_PRIORITY_SCHEDULING to 0, rather than a positive value, and have sysconf(_SC_THREAD_PRIORITY_SCHEDULING) return -1. However, this may break existing code that currently checks for a non-zero value (but does not care whether that value is specifically 200809L). It may also be worth defining a value that would allow implementations to know if the interfaces are supported but the thread scheduling guarantees might not be, as an intermediate position between missing and fully supported, whether as the compile-time value or as a result from sysconf, and/or adding a new optional thread scheduling policy to POSIX where Linux implements _that_ policy instead of TPS. But such proposals do belong as a separate bug. I will work with the glibc folks to see what we can come up with. See more at https://bugzilla.redhat.com/show_bug.cgi?id=1410052#c17

Date Modified	Username	Field	Change
2017-01-04 23:06	torvald	New Issue
2017-01-04 23:06	torvald	Name	=> Torvald Riegel
2017-01-04 23:06	torvald	Organization	=> Red Hat
2017-01-04 23:06	torvald	Section	=> pthread_rwlock_rdlock/pthread_rwlock_tryrdlock (XSH, 3 System Interfaces)
2017-01-04 23:06	torvald	Page Number	=> (page or range of pages)
2017-01-04 23:06	torvald	Line Number	=> (Line or range of lines)
2017-01-05 02:13	bhaible	Note Added: 0003528
2017-01-05 12:00	torvald	Note Added: 0003534
2017-01-05 12:15	bhaible	Note Added: 0003535
2017-01-05 15:56	shware_systems	Note Added: 0003536
2017-01-06 13:34	praiskup	Note Added: 0003538
2017-01-10 11:48	torvald	Note Added: 0003540
2018-06-14 16:15	geoffclare	Note Added: 0004042
2018-06-14 16:16	geoffclare	Note Edited: 0004042
2018-06-14 16:17	geoffclare	Interp Status	=> ---
2018-06-14 16:17	geoffclare	Final Accepted Text	=> 0001111:0004042
2018-06-14 16:17	geoffclare	Status	New => Resolved
2018-06-14 16:17	geoffclare	Resolution	Open => Accepted As Marked
2018-06-14 16:17	geoffclare	Tag Attached: tc3-2008
2019-08-30 00:41	eblake	Note Added: 0004545
2019-11-07 16:59	eblake	Note Added: 0004639
2019-11-28 12:10	geoffclare	Status	Resolved => Applied
2024-06-11 08:54	agadmin	Status	Applied => Closed

View Issue Details

Activities

Issue History