0000609: It is not clear what threads are considered blocked with respect to a call to pthread_cond_signal() or pthread_cond_broadcast()

Notes
(0001371) torvald (reporter) 2012-09-21 15:37	Speaking here for the interpretation that glibc's condvar implementation relies on: For waiters Wn and signaler S and "->" being happens-before, the spec states that if W1->S (i.e., not the return of the pthread_cond_wait but the part that atomically waits and unlocks the mutex), then W1 is considered a blocked thread. However, it does not state that W1 is _only_ considered to be a blocked thread if it happens before S. Therefore, if we would have S -> W2, and W2 blocks on the cond var that S signals, then W2 could also be considered as a blocked thread with respect to S' signal. As a result, if we have (W0, W1) -> S -> W2, then S could wake up either W0, W1, or W2. So, informally, signals are asynchronous in the sense that they will be eventually delivered at some time during or after the call to pthread_cond_signal, and will wake up at least one thread from the set of blocked threads at _this_ time. If this behavior is not allowed (and Mihail's interpretation is the actual intent of the spec), then S would have to wake up either W0 or W1 (which is a stronger guarantee). For implementations, this means that they would need to restrict which waiters a certain signal can wake. For example, for W1 -> S1 -> W2 -> S2, S1 could only wake W1 whereas S2 could wake W1 or W2. If that is the intent, then please also specify the follow-up questions that arise: For example, would S2 have to always wake W2 (otherwise, S2 could wake W1, S1 would have no associated waiter, and W2 would remain blocked)? There might be further follow-ups around cancellation or PI cond vars. Even with the former interpretation (i.e., what glibc's implementation assumes currently), applications can establish the stronger waiter--signal relationship (and thus the stronger guarantees in Mihail's interpretation) by using separate cond vars for each required set (even though there is no fallback or overflow to a previous set).

(0001372) dalias (reporter) 2012-09-21 17:46	I have followed the associated glibc bug report (#13165) for this issue for quite some time, and believe that the current behavior is non-conformant for reasons described in comments #14, 27, 29, and 31 on the bug tracker. I am also the primary author of musl libc in which we have an alternate, much simpler implementation of condition variables built on Linux futexes which is believed to perform at least as well (in particular, we have not been able to measure any spurious wakeups that occur with musl but not glibc/NPTL, despite the trouble NPTL is going to for the sake of avoiding them) and which does not suffer from the issue, so I see no reason the requirements of the standard should be relaxed to accommodate the undesirable glibc/NPTL behavior. There's no reason glibc cannot just remove the incorrect logic that's causing the problem.

(0001373) mmihaylov (reporter) 2012-09-21 18:29	Please, let us not move the dispute from the glibc bug here. Let's just limit ourselves to clarifications of the question.

(0001374) dalias (reporter) 2012-09-21 22:31	Indeed. I just want it to be clear that giving the implementation this freedom to neglect one aspect of quality (ordering guarantees that seem to be required by the current language of the standard) is not essential to providing other quality guarantees (reasonable performance, etc.) when implementing POSIX threads on top of the primitives the Linux kernel provides, and that the whole issue stems from an attempt at optimizing against spurious wakes in NPTL. Perhaps it would also be useful to know whether there are any other implementations which suffer from the same issue.

(0001378) siddhesh (reporter) 2012-09-25 16:36	Torvald, when you talk of the sequence: (W0, W1) -> S -> W2 resulting in W2 being woken up, have you defined 'entering a wait' as the Wn entering the futex_wait system call or is your definition of having entered wait something else? Given the glibc implementation, a waiter could be considered as having 'entered a wait' as soon as it has acquired the internal condition since that is what the signaller uses to 'enter' its signal. With that, I don't think it would be possible for any waiter to enter into a wait if S has taken the mutex and entered pthread_cond_signal before it, i.e. W2 will never wake up if (W0, W1) -> S -> W2 This is how the implementations are in glibc, for reference: pthread_cond_wait: 1: wait(internal_condition) 2: release(mutex) 3: update_condition_state 4: release(internal_condition) 5: futex_syscall(wait) /* blocks in the kernel */ 6: wait(internal_condition) 7: goto 4 if we are not to be woken 8: update_condition_state 9: release(internal_condition) 10: wait(mutex) pthread_cond_signal: 1: wait(internal_condition) 2: if(waiters_present) 3: update_condition_state 4: futex_syscall(wake) 5: release(internal_condition)

(0001379) torvald (reporter) 2012-09-25 17:05	Siddhesh, it is possible. But as Mihail pointed out, we should not discuss glibc's implementation here. This issue here is a general request for clarification of the specification, even though the glibc discussion triggered it. Thanks.

(0001396) msbrown (manager) 2012-10-10 16:17	Comments on the topic by David Butenhof <David.Butenhof@hp.com>: The argument in the glibc "interpretation" appears to hinge on the idea that POSIX use of the terms "block" and "unblock" are not discrete atomic events or transitions, and that "unblock" in fact sends some unsynchronized signal that arrives at a later time to (presumably) cause a blocked thread to be removed from its list and scheduled to run. First, I'll take the claims I infer from the post at face value against the wording and intent of the standard, and then I'll comment on the more pragmatic aspects. (If you find yourself objecting to something in the first part, read the second part before replying.) Standards first: This interpretation seems to clearly contradict the usage within the standard, and certainly the intent at the time the text was composed. (Note, while I glanced through the referenced glibc bug report, I haven't tried to read through the code that purportedly demonstrates the problem. I'd rather deal with this in terms of the abstract POSIX standard implications rather than get embroiled in the glibc issue; and, frankly, I tend to agree with a contention in the bug discussion that any user code demonstrating a flaw short of simply "no wakeup occurs" is probably trying to exploit some unjust assumptions about fairness.) "Blocked" means that the thread ceases to be runnable; "unblock" means the target thread has been scheduled to run. Both terms are used as discrete atomic events; there's no concept of a "partial" or "pending" unblock -- a thread is either blocked or not blocked. But this is exactly the problem foreseen by those arguing in favor of a concise mathematical description of thread behavior; which was rejected in favor of the more casual prose descriptions. (Ultimately, I think the committee was correct: a dense "mathematical" description as proposed would likely have made the early standard much less approachable and perhaps less quickly implemented. And, considering the number of independent implementations and the length of time, the remarkably low frequency of this sort of problem suggests that greater complication wasn't needed.) If we assume W1 -> S1 -> (W2, W3) -> B1 -> (W4, W5) -> S2 -> S3, then signal S1 has the set (W1) from which to select and therefore shall unblock W1 removing it from the wait set and making it runnable. Broadcast B1 finds a set of two blocked threads W2 and W3 and must unblock both, removing them from the wait set and making both runnable. Signal S2 finds a set of two waiters W4 and W5 and shall select exactly one of those two to unblock; based on highest scheduling priority if priority scheduling is in effect or else using some unspecified arbitrary criterion but, again, removing that thread from the wait set and making it runnable. The final S3 must then find exactly one thread in the candidate set, whichever was not selected and unblocked by S2, and shall remove it from the wait set and make it runnable. It is impossible for S1 to unblock W2, W3, W4, or W5 as they cannot have become part of the wait set at the time S1's discrete and atomic "unblock" action takes place. That would imply a "pending wake", a concept that is intentionally absent from POSIX. The original author's statement of the glibc position is that POSIX doesn't specify that "pthread_cond_signal() will wait for a thread to be unblocked by the signal". This almost sounds like confusion between the "UNIX signal" and "condition signal" concepts, as if the latter was a similar asynchronous and potentially deferred action. But I don't see any reasonable leeway in the descriptions for such a radical interpretation. The POSIX model of an unblock operation removes a thread from the wait set and makes it runnable. There is no way to introduce a concept of "waiting" for that to happen; it's not asynchronous. Granted, POSIX doesn't formally define the operations; but that was the crux of the "prose vs mathematics" argument, that the casual usage, in context, is sufficiently clear. Debatable, certainly; but I think true enough in practice. But, with that out of the way, here's the pragmatic side: In reality we ask (and realistically can test) only that pthread_cond_signal is guaranteed to wake at least one thread in the condition variable's current wait set (if any). (At least, unless you're running the test code entirely SCHED_FIFO on a uniprocessor.) That's why we have spurious wakeups: the determination of which application thread proceeds is entirely the responsibility of the application logic. It becomes impossible for the application to cope only if there are waiting threads and pthread_cond_signal fails to wake at least one. There seems to be some controversy within the glibc bug report whether we're talking about the possibility of pthread_cond_signal failing to unblock a waiter, or only whether it's possible to unblock an "undesired" waiter. If S1 may unblock W2 instead of W1, at the implementation level, I think there's an error in the implementation; but while a convoluted program can potentially detect this, it's difficult to see how this can matter to any real application without arguing about "scheduling fairness". I'm presuming that W2 would have to occur very close to S1 unless this presumptive "asynchronous unblock" is absurdly slow; and in any real program, especially on a multiprocessor, that means the order of S1 and W2 might just as easily have been reversed. We can easily argue at the standards and implementation level about the proper and intended behavior of the "unblock" operation on a condition variable; but correct POSIX application code cannot observe that transition. It can only observe return from the call to pthread_cond_wait which requires a lot more including contention for the associated mutex and processor scheduling; none of which is itself directly observable or easily controllable (again, without SCHED_FIFO on a uniprocessor). It also seems possible that such an "asynchronous unblock" could mean that W1 may never be awakened, presuming there's always a "Wx" waiter immediately following any "Sy" signaler. This also seems clearly an error; but again I question whether it can matter to a correct application so long as some thread always returns from pthread_cond_wait after any pthread_cond_signal. Multiple waiters for a condition variable either are awakened by broadcast or are all waiting for the identical predicate... in which case the program doesn't care which thread wakes. (If all consumer threads are never concurrently unblocked by the producer logic, it only means that the consumer was coded "too wide" for the actual application parallelism; it doesn't matter that the same thread always remains blocked. In fact, this is probably best for application performance.) We can therefore argue quality of implementation, or scheduling fairness: but neither is a part of the standard and therefore, for ! our purposes (though not necessarily for the glibc bug discussion) irrelevant. ** And back to the actual POSIX defect report: The action is "Add an explicit explanation as to which threads are eligible to be unblocked by a call to pthread_cond_signal() and pthread_cond_broadcast()." I would argue there is in fact precisely that explanation in place. And while there seems to be some contention that, at the "subatomic" level, glibc may violate the letter of that "explicit explanation", I see no clear evidence that anyone is claiming it fails in a way that be detected by a meaningful conforming POSIX application. Or, perhaps more precisely, that POSIX grants to the application any grounds to practically or portably exploit the guarantee claimed to be absent in glibc's implementation, so long as at least one thread returns from pthread_cond_*wait (if any) corresponding to each pthread_cond_signal call on the same condition variable. (If glibc really uses an "asynchronous unblock" as described, I have difficulty seeing how there wouldn't be far worse consequences than this; but that's a different matter! .) In any case, I don't see that the definition of "blocked" is disputed, and (at a simple prose level) XBD 3.7.6 seems to cover it well enough: "waiting for some condition (other than availability of a processor) to be satisfied before it can continue execution". It's a thread that can't run until some external action makes it runnable. And I think that the pthread_cond_broadcast/pthread_cond_signal description neatly treads a valuable line in not actually saying that the unblocked thread is "made runnable" in some fashion; it says instead that the thread(s) "shall contend for the mutex [as if] each had called pthread_mutex_lock". The pthread_mutex_unlock text is a bit muddier, saying that if threads are blocked "the scheduling policy shall determine which thread shall acquire the mutex". It doesn't say anything about unblocking, although it's pretty clearly implied.

(0001398) mmihaylov (reporter) 2012-10-10 19:43 edited on: 2012-10-11 06:14	Let me see if I understand David's comment correctly. The way I understand the first part is that the standard dictates that implementations should guarantee that a call to pthread_cond_signal() (provided the calling thread owns the mutex) will unblock at least one thread that is in the current wait set of the condition variable. The current wait set consists of all threads that called pthread_cond_wait() before the signal was sent, excluding those that have already been unblocked by an earlier call to pthread_cond_signal() or pthread_cond_broadcast() or because of a spurious wakeup. No other threads are included in the wait set. Based on this, the comment states that implementations that don't provide this guarantee are incorrect - therefore NPTL's implementation is incorrect too. However, the second part of the comment states that real life correctly written POSIX-based applications cannot rely on this guarantee because they cannot detect when it is violated. This statement is based on the presumption that the predicate is identical for all blocked threads - it is implied that this includes even those threads that blocked after the signal was sent. In the concluding section of the comment David claims that the definition of "blocked" is not contested and that there is no evidence that a "meaningful" conforming application has failed because this guarantee was violated. Now let me comment on the comment... 1) There is a contradiction between the first and the second part of the comment. If conforming applications are not allowed to rely on this guarantee, why is it specified by the standard? 2) There is a claim that the predicate should be the same for all waiting threads. Is there a text in the standard which requires this? If the libc implementation provides the ordering guarantee in question, this is not necessary. It is enough for the predicate to be true for the threads in the current wait set. And it should be acceptable for the predicate to be false for a wait set for which no signal has been sent yet. By this I mean that it should be acceptable to have a predicate which involves some thread specific state, as long as this predicate is guaranteed to be true for all threads that blocked before the last signal/broadcast was sent. At the same time it should be acceptable for the predicate to be false for some or all of the threads that blocked after the last signal was sent, as long as it becomes true for them when the next signal is sent. Incidentally that's the kind of predicate that one would need to reliably detect spurious wakeups. And this is the kind of predicate that will detect when a libc implementation breaks the ordering guarantee. 3) I find that the claim that there is no evidence that the violation of this guarantee cannot be detected contradicts the fact that there exists a bug against glibc about it. Actually there is also one more bug about the same issue, but its reporter was not as persistent as me. I did not file the bug because I wrote a test to check if it is supported. I discovered the problem because my meaningful conforming application failed because of it. My use case is simple - I needed to write on top of condition variables a waitable object which prevents spurious wakeups while preserving the ordering guarantee specified in the POSIX standard - basically a condition variable with no spurious wakeups. I don't know what was the use case of the other reporter, but I think it would be unfair to assume his use case was not meaningful and conforming. 4) As for the claim that the definition of "blocked" is not contested in the glibc bug - that's true in some sense. What is contested was what threads should be included in the current wait set for a call to pthread_cond_signal() or pthread_cond_broadcast(). And that's the clarification that I requested, although maybe my wording wasn't precise enough. So, now that the comment has confirmed that the guarantee is stated in the standard, the remaining question is - are implementations required to provide this guarantee, or are they allowed to break it because it won't affect 99.9% of the users? EDIT 1: Added "real life" to my interpretation of the second part of David's comment. EDIT 2: Fixed a typo in 1) - replaced "implementations" with "applications" EDIT 3: Added a clarification to 2) describing the kind of predicate I believe should be supported by the condition variable implementation. EDIT 4: Removed the mention of Win32 events from 3), because they are only similar to what I was implementing in that they have no spurious wakeups. EDIT 5: Changed slightly the wording at the end of 3) to make it clearer EDIT 6: Added 4)

(0001403) geoffclare (manager) 2012-10-17 16:39 edited on: 2012-10-17 16:42	In the Oct 17th conference call we agreed to make the following changes. Further work is needed to produce some interpretation text to precede these changes. At line 50746 change: The pthread_cond_broadcast() function shall unblock all threads currently blocked on the specified condition variable cond. The pthread_cond_signal() function shall unblock at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond). to: The pthread_cond_broadcast() and pthread_cond_signal() functions shall atomically determine which threads, if any, are blocked on the specified condition variable cond. This determination shall occur at an unspecified time during the pthread_cond_broadcast() or pthread_cond_signal() call. The pthread_cond_broadcast() function shall then unblock all of these threads. The pthread_cond_signal() function shall unblock at least one of these threads. At line 50762 change: The pthread_cond_broadcast() and pthread_cond_signal() functions shall have no effect if there are no threads currently blocked on cond. to: The pthread_cond_broadcast() and pthread_cond_signal() functions shall have no effect if they determine that there are no threads blocked on cond. Add a new paragraph to the RATIONALE section after line 50792: There is no guarantee of overall atomicity for the pthread_cond_broadcast() and pthread_cond_signal() functions, and there is a conceptual point during the execution of these functions where the set of blocked threads is (atomically) evaluated. Threads could enter this set at any point before this, hence the standard speaks of "an unspecified time during the call".

(0001404) Konrad_Schwarz (reporter) 2012-10-18 07:45	I have very serious reservations on the proposed changes of Note 1403, which eliminate atomicity requirements from condition variable operations. For the condition variable/mutex paradigm to be useful to applications, that is, to make reasoning about concurrency tractable, the condition variable signal/broadcast/wait and mutex enter/exit operations must be atomic with regards to one another. Anything else is a recipe for disaster. See below for individual points. > At line 50746 change: > > The pthread_cond_broadcast() function shall unblock all threads > currently blocked on the specified condition variable cond. > > The pthread_cond_signal() function shall unblock at least one of > the threads that are blocked on the specified condition variable > cond (if any threads are blocked on cond). > > to: > > The pthread_cond_broadcast() and pthread_cond_signal() functions > shall atomically determine which threads, if any, are blocked > on the specified condition variable cond. This determination > shall occur at an unspecified time during the > pthread_cond_broadcast() or pthread_cond_signal() call. > The pthread_cond_broadcast() shall then unblock all of these > threads. The pthread_cond_signal() function shall unblock at > least one of these threads. I'm sorry, but this suggestion is a serious mistake. It seems designed to accommodate the current -- broken -- Glibc implementation: that is, Glibc wakes up threads that will block on the condition variable in the future in lieu of threads that are already blocked, potentially leaving those threads blocked forever. Why "determine"? signal() must wake up one thread, broadcast() must wake up all threads blocked on the condition variable, no doubt about it. Why write "at an unspecified time within the [..] call"? Every action specified for any POSIX function happens at an unspecified time within a call of the function -- how does pointing this out help here? On the contrary, condition variable signal, broadcast, wait and mutex enter and exit must all be atomic with regards to one another. Any loosening of this contract inordinately complicates reasoning about concurrent processes; as no guarantees apply ("determine the threads blocked"?). E.g., under the proposed change, n calls to signal() will no longer necessarily unblock n waiting threads -- an invocation of signal() could incorrectly determine that no threads are blocked, when in fact there are some. (A contract that is easily derived from the Larch specification, by the way -- not that I advocate usage of formal specifications in POSIX.) So, if a change is indeed required, change from line 50746 to: The pthread_cond_broadcast() function shall _atomically_ unblock all threads currently blocked on the specified condition variable cond. The pthread_cond_signal() function shall _atomically_ unblock at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond). > At line 50762 change: > > The pthread_cond_broadcast() and pthread_cond_signal() functions > shall have no effect if there are no threads currently blocked on > cond. > > to: > > The pthread_cond_broadcast() and pthread_cond_signal() functions > shall have no effect if they determine that there are no threads > blocked on cond. An implementation where signal() is a null and void operation would seem to conform to the suggested changed text: this implementation never "determines" that there are any threads blocked on any condition variable. Please discard this change; it is obviously broken. > Add a new paragraph to the RATIONALE section after line 50792: > > There is no guarantee of overall atomicity for the > pthread_cond_broadcast() and pthread_cond_signal() functions, and > there is a conceptual point during the execution of these functions > where the set of blocked threads is (atomically) evaluated. Threads > could enter this set at any point before this, hence the standard > speaks of "an unspecified time during the call". This is exactly the wrong direction for POSIX to take. "Overall" atomicity makes these operations (fairly) easy to understand. This makes reasoning about concurrency and synchronization somewhat more tractable -- parallel programming is notably difficult. What advantages are gained from this change? It is very detrimental to applications; advantages for implementations are yet to be named. It is also a radical departure from historical practice: from Hoare monitors, via Lampson's Mesa, through Modula-3 (evidenced by its formal Larch specification), these operations have always been atomic, and with very good reasons. Thanks, Konrad Schwarz

(0001405) dalias (reporter) 2012-10-18 12:32	I think you're misreading the proposed changes to the text. They do not allow the broken behavior but clarify that it is disallowed. The determination of which threads are blocked is required to be made atomically at some point during the call to pthread_cond_signal or pthread_cond_broadcast. Thus, if the mutex is held during the call, the set of threads blocked is completely determined, and if the mutex was obtained and released before the signal/broadcast operation, then the set contains at least those threads which blocked before the mutex was obtained, but may also include others that arrive at some point (prior to the atomic determination) during the signal/broadcast call. This is exactly what the original specified behavior was; some folks just did not find the original text clear enough and believed it allowed the current, undesirable behavior glibc is exhibiting.

(0001408) torvald (reporter) 2012-10-24 17:34	Thanks for working on those changes to the specification. The changes make signal/broadcast essentially linearizable, which I agree is probably the best approach in the absence of a complete memory model. However, this effectively requires a total order over all those operations on all condition variables in the program, which can require stronger low-level synchronization in implementations than if each cond var would have a total order of these operations that just needs to be consistent with happens-before. In particular, considering C11/C++11, the former requires some sequentially-consistent operation, whereas the later just needs acquire/release memory orders. It would be nice to know if you see the latter as sufficient too. I've raised an issue similar to the this clarification request with C++11 because C++11 didn't require consistency with happens-before: http://cplusplus.github.com/LWG/lwg-active.html#2190 [^] C++11's proposed resolution is to add this consistency requirement. I haven't had time to review the minutes of the recent C++ meeting yet, so I can't say whether they went for a global total order or just individual total orders for all the operations on each condition variable. Finally, I think that the planned addition to the RATIONALE probably do confuse more than they help: > There is no guarantee of overall atomicity for the > pthread_cond_broadcast() and pthread_cond_signal() functions, and > there is a conceptual point during the execution of these functions > where the set of blocked threads is (atomically) evaluated. Threads > could enter this set at any point before this, hence the standard > speaks of "an unspecified time during the call". What is "overall atomicity" meant to refer to? The rest of the paragraph makes sense though. Perhaps it's better to explain it along the same lines as C++11 does? (That is, split any waiting into three parts, then make those and signal/broadcast atomic with respect to each other, and require them to be totally ordered in a way consistent with happens-before. Paragraph 30.5, sentences 4-5 in the spec, plus the changes of issue 2190 referenced above.) The core subject of this specification issue was not atomicity (i.e., that something executes as one indivisible step), but ordering (when this indivisible step executes with respect to (1) other things going on in the same thread or (2) communication with other threads or operations in those other threads). Thus, perhaps it's better to state that the ordering requirements got stronger, but the atomicity requirements have stayed all the same. That might help prevent misunderstandings.

(0001409) nialldouglas (reporter) 2012-10-25 23:31	It seems to me that, portably speaking, pthread conditions can merely "best effort" on waking threads. When you signal "wake one thread" or "wake all threads", there is no way for any portable implementation to know how many threads are currently sleeping. Therefore "wake one thread" could mean wake none, wake one, or conceivably wake a small number. Similarly, "wake all threads" really means "wake as many as you can, more or less". If you want stronger guarantees, then it's your responsibility to wrap the condition variable calls with appropriate atomic CAS logic and to loop the pthread calls appropriately. An example of this for C11 lives at https://github.com/ned14/ISO_POSIX_standards_stuff/tree/master/pthreads%20Notifer%20Object. [^] I would suggest it might be useful for C++11 too. I would strongly suggest something like a pthreads notifier object be brought into being to avoid naive usage of pthread condition variables. There's too many corner cases in those for most programmers. If your glibc maintainer would like, I'm happy to contribute my notifier implementation for use in glibc. Niall

(0001410) dalias (reporter) 2012-10-26 00:02	It is very easy for an application, based on the actions it has performed, to know exactly which threads are included in the set of threads T such that either T is currently blocked on the condition variable, or T has spuriously woken and is waiting to acquire the mutex and return from pthread_cond_wait. Therefore, it is completely reasonable to assume that that calling pthread_cond_signal on the condition variable, and subsequently unlocking the mutex, will allow at least one thread in that set to make forward progress.

(0001411) nialldouglas (reporter) 2012-10-28 00:37 edited on: 2012-10-28 00:38	"It is very easy for an application, based on the actions it has performed, to know exactly which threads are included in the set of threads T such that either T is currently blocked on the condition variable, or T has spuriously woken and is waiting to acquire the mutex and return from pthread_cond_wait. Therefore, it is completely reasonable to assume that that calling pthread_cond_signal on the condition variable, and subsequently unlocking the mutex, will allow at least one thread in that set to make forward progress." In traditional SMP architectures with a system wide atomic CAS (preferably system wide atomic increment) you are correct that it is possible to ensure, relatively cost free, a 1-1 relationship between pthread_cond_signal and a thread wake via its associated mutex. I might add that implementation overheads are a bit heavier than I'd prefer here, but yes it's doable. In NUMA, you are not correct, because you're relying on messages whose results you don't wait for. You can't know if a wake has any effect. You can't know the state or wait list of a mutex either. That's why the current POSIX approach is the right one - leave it to the API user to take care of spurious and lost wakeups if that is necessary to the use case. When NUMA systems become much more common, we can standardise then whatever practice emerges by then as least worst. If you'd like different behaviour, propose a new API set rather than retrofit POSIX condition variables. Niall

(0001412) dalias (reporter) 2012-10-28 01:11	In response to note #0001411, the quoted text from note #0001410 is not specific to any particular architecture. The claim I made, which I stand by, is that making this determination is easy in the abstract machine specified by POSIX, and therefore is easy on any implementation of POSIX, regardless of the hardware architecture. SMP vs NUMA is irrelevant. Further, "the current POSIX approach" does not allow lost wakeups. It allows spurious wakeups, which do not matter, as they have no bearing on the set of threads described. For the sake of clarity, here's how an application would go about tracking and making use of knowledge about which threads are waiting on the condition variable or just-woken (possibly spuriously) and waiting for the mutex: Each waiter performs the following steps: 1. Acquire mutex. 2. Add self to list of waiters protected by the mutex. 3. Call pthread_cond_wait 4. Remove self from list of waiters. 5. Release mutex. Now, once the signaling thread has acquired the mutex, it can inspect the list of waiters and know exactly which threads are waiting; call this set S. If it calls pthread_cond_signal and pthread_mutex_unlock and then blocks waiting for an event that will only be triggered by forward progress in one of the threads in set S, the standard, both as currently written and with the proposed clarifications, requires that the event occur and that forward progress be made. I don't see any way you can argue against this.

(0001413) nialldouglas (reporter) 2012-10-28 18:33	Regarding note #0001412, simply repeating yourself without new information isn't going to convince me. Nor will I simply repeat myself. There is always a gulf between design and practice, and while the POSIX spec design might suggest no lost wakeups, in practice that's hard to achieve perfectly on distributed architectures in an efficient way. It also introduces an amount of overhead for SMP architectures which I am not at all convinced is necessary nor desirable. How about this: get someone with deep domain experience in this field like Hans Boehm to yay or nay this? Whatever he supports I'll accept. Niall

(0002348) torvald (reporter) 2014-08-15 14:16	What is the current status of this? Is there consensus that POSIX wants to provide the stronger wake-up ordering guarantees? If so, is there still discussion about how to actually express, or is the final text already in some document? For reference, here's how C++ clarified the specification: http://cplusplus.github.io/LWG/lwg-defects.html#2190 [^] If this matches with POSIX' intent, please let me know. Also, while the C++ wording can't be used as is for POSIX, I think it specifies the semantics in a useful way, so maybe POSIX can say something similar. Thanks.

(0002349) geoffclare (manager) 2014-08-15 14:29 edited on: 2014-08-21 14:36	[Comment since retracted; see Note: 0002350 below.] I believe the consensus from Note: 0001403 still stands. All that remains is to work out how to word the formal interpretation response to precede those changes.

(0002350) geoffclare (manager) 2014-08-21 14:34	After reviewing the email discussion that followed Note: 0001403, I retract my previous comment that the consensus from that note still stands. There is a problem with cancelation which Dave Butenhof raised. My email reply to Dave follows (mail sequence 18208) ... Butenhof, David <xxxxxxxxxx> wrote, on 18 Oct 2012: > > When reading the proposed amendment, I felt vaguely uncomfortable > with the wording around the "atomic determination" of blocked threads > and the "unblock"; but at the time I ended up thinking "I need to > think about that some more." Now I believe I understand. If we allow > breaking "scheduler atomicity" here we risk that something else > may unblock one or more threads; for example, cancelation. We can > add stronger wording to ensure that (regardless of the "scheduler > atomicity" we're applying with respect to the unblock operation) > the entire CV operation remains strictly atomic with respect to > any other access to the condition variable. (That is, nothing else, > including cancelation, can disturb the "set of blocked threads" while > pthread_cond_signal holds that set between "atomic determination" and > "unblock".) But this would appear to leave us with essentially what > Geoff says Konrad wants: require that an implementation "atomically > determine the set of blocked threads and unblock at least one of that > set". Good point about cancelation. We hadn't considered that in the call. > Yes, we must require selection of a thread that can be identified > as blocked on the CV during the execution of pthread_cond_signal, > not merely at some unrelated point in time; but holding onto an > identified thread until a later time is a recipe for disaster. Not > that it's impossible to build a good implementation this way; it's > just extremely difficult to clearly and completely describe the > required behavior in that framework. (If there's any way that the list > of blocked threads can be changed between the "atomically determine" > and "unblock", then pthread_cond_signal must be prepared to repeat the > "atomically determine" and "unblock" indefinitely until it selects a > target that remains blocked across the gap.) > > Unblocking an extra thread is just a spurious wake; but "unblocking" a > thread that's already unblocked (aside from any other implementation > problems this might cause in the scheduler) means losing a wakeup, and > that's unacceptable. I agree. So we're left with a choice between just requiring pthread_cond_signal() to be completely atomic, or something complicated that ensures it unblocks a thread that isn't already unblocked. Does anyone know of any implementations where pthread_cond_signal() isn't completely atomic and would conform to this second alternative?

(0002355) dalias (reporter) 2014-08-21 17:19	Regarding Dave Butenhof's concern about "unblock a thread that's already unblocked" in the case of cancellation already being in process, I don't think this applies. POSIX does not define "consuming a wake" as a side effect of pthread_cond_[timed]wait ("consuming a wake" is merely an implementation detail of how cond vars might be implemented), so there is no obligation, per the rules on side effects when cancellation is acted upon, for the cancelled thread not to "consume a wake" as long as the behavior is otherwise not observably wrong. There seems to be a perfectly good window, after the unblocking for which the call was waiting for but before it attempts to acquire the mutex, during which cancellation may still legally be acted upon despite the fact that "the event for which it is waiting [has] occur[ed]" (XSH 2.9.5 Thread Cancellation), since no additional side effect takes place here. Of course the above does not apply if the thread that is being cancelled is already observably unblocked before the signal, i.e. if it has potentially already acquired the mutex. In that case it's utterly wrong for the signal operation not to wake another waiter (if there is one). As far as adding the confusing language about atomicity, I'm not really happy with it anyway. I would really prefer something very simple: after the call to pthread_cond_signal, at least one thread which was blocked before the call is no longer blocked when the call returns. That's exactly how I interpreted the original text, and I don't see why any elaborate and confusing changes to the text are needed to make this clear.

(0002356) torvald (reporter) 2014-08-21 18:25	I'm also not happy about the originally proposed wording, but for a somewhat different reason: It's okay if we point out which parts ought to be atomic, but we also need to put them into a particular order. In light of the alignment of C11 and POSIX that at least some of us are hoping for, I'd prefer if we could phrase the guarantees along the lines of what C++ does. The essential parts: "The execution of notify_one and notify_all shall be atomic. The execution of wait, wait_for, and wait_until shall be performed in three atomic parts: 1. the release of the mutex and entry into the waiting state; 2. the unblocking of the wait; and 3. the reacquisition of the lock The implementation shall behave as if all executions of notify_one, notify_all, and each part of the wait, wait_for, and wait_until executions are executed in somea single unspecified total order consistent with the "happens before" order. void notify_one() noexcept; Effects: If any threads are blocked waiting for *this, unblocks one of those threads." Of course, we don't have happens-before yet, but it's The Right Thing To Do to use it as constraint; as an alternative, we could use the total order that's currently the base for all the sync operations. Regarding dalias comment in Note: 2355, the window he mentions would be between steps 2 and 3. What he seems to argue for, AFAIU, is that cancellation is another action that would be part of the total order; thus, it could happen between steps 2 and 3. What David seems to be arguing for is that cancellation should be an atomic step that unblocks the thread and removes the cancelability of the thread. Or is this characterization incorrect?

(0002357) dalias (reporter) 2014-08-21 18:51	David may be arguing that, but I don't think there's any text in the standard which requires that behavior, for at least two reasons: that there is no formal side effect of "consuming a wake", and that the behavior of pthread_cond_[timed]wait on cancellation is underspecified due to the lack of any EINTR condition for it to be equivalent to. This is a whole separate topic I would like to solve in the next issue of the standard: making explicit the range of possible side effects on cancellation. As for whether such a requirement (that cancellation cannot be acted upon after being unblocked from the cond var) should be added, I'm rather indifferent. My current implementation does not meet that requirement but I can make a trivial change so that it does, if it's deemed important or required for conformance, so I have no preference from a cost-of-implementation standpoint. Regarding Torvald's remarks on alignment with the C++ semantics, my preference would be to avoid overspecifying things here. If an order relationship exists between the all the waiters and the signal (which, by the way, is always true if the mutex is held while signaling), it seems to me that it suffices to require that the signal operation unblock one waiter ordered before the signal. This is how I interpret the current text, except that it does not mention the unordered case. If there are truly waiters which are unordered with respect to the signal, I don't see how it matters to the application what happens.

(0002358) torvald (reporter) 2014-08-21 19:55	I don't think it's an overspecification, because it defines what "before" means in your proposed wording (which isn't specified there...). And even if there's no happens-before order that constrains some of the condvar's atomic pieces, C++ clarifies that those will nevertheless be part of a total order, and not, for example, be ordered differently on different CPUs. Also, I don't think brevity is a primary goal here. In C++ SG1, we once observed that if we're discussing something for 30min to reach a conclusion, then it's really time to add at least a note that explains the outcome of the discussion to the readers of the specification. I think we've reached more than 30min here :)

(0002359) dalias (reporter) 2014-08-21 20:25	I agree that brevity isn't the primary goal, but I think it is important (at least desirable) that users of the standard (application writers) be able to readily understand the semantics of the interface without being experts on the technical details. If it's deemed necessary to break the action down into atomic steps to specify it clearly, then I would very much appreciate text in the application usage section along the lines of "as long as you use the interface this way, you can rely on the following behavior".

(0002361) dalias (reporter) 2014-08-25 16:59	Regarding my own comment #2357, there actually is text purporting to forbid the cancelled thread from "consuming signals", but it appears under Rationale. It seems wrong for non-normative text to be asserting the existence of a requirement on implementations that's not supported anywhere in normative text. Could somebody clarify this situation? Here's the text I'm referring to: "When acting on a cancellation request while a thread is blocked on a condition variable, the implementation is required to ensure that the thread does not consume any condition signals directed at that condition variable if there are any other threads waiting on that condition variable. This rule is specified in order to avoid deadlock conditions that could occur if these two independent requests (one acting on a thread and the other acting on the condition variable) were not processed independently." It appears under the heading "Cancellation and Condition Wait" in the Rationale for pthread_cond_wait and pthread_cond_timedwait.

(0003229) gratian (reporter) 2016-05-17 22:13	What is the current status of this? Is there still a consensus for Note: 0001403? I see that the comment stating that above was subsequently retracted (Note: 0002350).

(0005505) geoffclare (manager) 2021-10-15 08:31	When this bug is revisited, changes to cnd_broadcast() and cnd_signal() (being added by bug 0001302) may also be needed.

(0005948) geoffclare (manager) 2022-08-26 14:26 edited on: 2022-08-26 14:37	Since this bug will be reviewed on the next conference call, I looked back at the status the last time we worked on it and found the following in old minutes: 31 October 2012 This was discussed in length on the call. It was felt that the direction to require fully atomic behaviour is the right direction, however we need feedback on the internals of existing implementations to see whether pthread_cond_signal() and pthread_cond_broadcast() are already or can be made atomic. Action: Jim to provide feedback on Solaris Action: Andrew to contact HP/IBM/Apple 7 November 2012 Previously we had assigned actions to research existing implementations. Jim reported that Solaris places the entire process under lock protection. We are still waiting for responses from other vendors. I also found an email to the core list on 3 Dec 2012 giving feedback from HP which said that their implementation is already atomic. Unless I have missed something, it appears we didn't ever get feedback from IBM and Apple.

(0005949) geoffclare (manager) 2022-08-26 14:35 edited on: 2022-09-01 15:30	Interpretation response ------------------------ The standard clearly states that pthread_cond_broadcast() unblocks all of the threads, and pthread_cond_signal() unblocks at least one of the threads, that are currently blocked on the specified condition variable, and conforming implementations must conform to this. Rationale: ----------- Although the word "currently" is imprecise about when exactly the set of threads blocked on cond is determined, it is clear that this point must be during the call to pthread_cond_broadcast() or pthread_cond_signal(). Therefore threads which become blocked after the call has returned must not be unblocked by the call. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- At line 50746 change: The pthread_cond_broadcast() function shall unblock all threads currently blocked on the specified condition variable cond. The pthread_cond_signal() function shall unblock at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond). to: The pthread_cond_broadcast() function shall, as a single atomic operation, determine which threads, if any, are blocked on the specified condition variable cond and unblock all of these threads. The pthread_cond_signal() function shall, as a single atomic operation, determine which threads, if any, are blocked on the specified condition variable cond and unblock at least one of these threads. At line 50762 change: The pthread_cond_broadcast() and pthread_cond_signal() functions shall have no effect if there are no threads currently blocked on cond. to: The pthread_cond_broadcast() and pthread_cond_signal() functions shall have no effect if they determine that there are no threads blocked on cond. Note to the editor: In Issue 8, after applying bug 1302, make equivalent changes to the cnd_broadcast() page.

(0005955) agadmin (administrator) 2022-09-01 15:37	Interpretation proposed: 1st September 2022

(0005986) agadmin (administrator) 2022-10-03 10:07	Interpretation Approved: 3rd October 2022

Issue History
Date Modified	Username	Field	Change
2012-09-20 14:18	mmihaylov	New Issue
2012-09-20 14:18	mmihaylov	Status	New => Under Review
2012-09-20 14:18	mmihaylov	Assigned To	=> ajosey
2012-09-20 14:18	mmihaylov	Name	=> Mihail Mihaylov
2012-09-20 14:18	mmihaylov	Section	=> pthread_cond_broadcast, pthread_cond_signal
2012-09-20 14:18	mmihaylov	Page Number	=> 1043
2012-09-20 14:18	mmihaylov	Line Number	=> 33043 - 33046
2012-09-21 14:24	torvald	Issue Monitored: torvald
2012-09-21 15:37	torvald	Note Added: 0001371
2012-09-21 17:46	dalias	Note Added: 0001372
2012-09-21 18:29	mmihaylov	Note Added: 0001373
2012-09-21 22:31	dalias	Note Added: 0001374
2012-09-25 16:36	siddhesh	Note Added: 0001378
2012-09-25 17:05	torvald	Note Added: 0001379
2012-10-10 16:17	msbrown	Note Added: 0001396
2012-10-10 19:43	mmihaylov	Note Added: 0001398
2012-10-11 06:14	mmihaylov	Note Edited: 0001398
2012-10-17 16:39	geoffclare	Note Added: 0001403
2012-10-17 16:42	geoffclare	Note Edited: 0001403
2012-10-18 07:45	Konrad_Schwarz	Note Added: 0001404
2012-10-18 12:32	dalias	Note Added: 0001405
2012-10-24 17:34	torvald	Note Added: 0001408
2012-10-25 23:31	nialldouglas	Note Added: 0001409
2012-10-26 00:02	dalias	Note Added: 0001410
2012-10-28 00:37	nialldouglas	Note Added: 0001411
2012-10-28 00:38	nialldouglas	Note Edited: 0001411
2012-10-28 01:11	dalias	Note Added: 0001412
2012-10-28 18:33	nialldouglas	Note Added: 0001413
2014-08-15 14:16	torvald	Note Added: 0002348
2014-08-15 14:29	geoffclare	Note Added: 0002349
2014-08-21 14:34	geoffclare	Note Added: 0002350
2014-08-21 14:36	geoffclare	Note Edited: 0002349
2014-08-21 17:19	dalias	Note Added: 0002355
2014-08-21 18:25	torvald	Note Added: 0002356
2014-08-21 18:51	dalias	Note Added: 0002357
2014-08-21 19:55	torvald	Note Added: 0002358
2014-08-21 20:25	dalias	Note Added: 0002359
2014-08-25 16:59	dalias	Note Added: 0002361
2016-05-17 22:13	gratian	Note Added: 0003229
2021-10-15 08:31	geoffclare	Note Added: 0005505
2021-10-15 08:32	geoffclare	Relationship added	related to 0001302
2022-08-26 14:26	geoffclare	Note Added: 0005948
2022-08-26 14:35	geoffclare	Note Added: 0005949
2022-08-26 14:37	geoffclare	Note Edited: 0005948
2022-09-01 15:30	geoffclare	Note Edited: 0005949
2022-09-01 15:31	geoffclare	Interp Status	=> Pending
2022-09-01 15:31	geoffclare	Final Accepted Text	=> Note: 0005949
2022-09-01 15:31	geoffclare	Status	Under Review => Interpretation Required
2022-09-01 15:31	geoffclare	Resolution	Open => Accepted As Marked
2022-09-01 15:32	geoffclare	Tag Attached: tc3-2008
2022-09-01 15:37	agadmin	Interp Status	Pending => Proposed
2022-09-01 15:37	agadmin	Note Added: 0005955
2022-10-03 10:07	agadmin	Interp Status	Proposed => Approved
2022-10-03 10:07	agadmin	Note Added: 0005986
2022-10-21 08:43	geoffclare	Status	Interpretation Required => Applied

Aardvark Mark IV