Feature #19717
open`ConditionVariable#signal` is not fair when the wakeup is consistently spurious.
Description
For background, see this issue https://github.com/socketry/async/issues/99.
It looks like ConditionVariable#signal
is not fair, if the calling thread immediately reacquires the resource.
I've given a detailed reproduction here as it's non-trivial: https://github.com/ioquatix/ruby-condition-variable-timeout.
Because the spurious wakeup occurs, the thread is pushed to the back of the waitq, which means any other waiting thread will acquire the resource, and that thread will perpetually be at the back of the queue.
I believe the solution is to change ConditionVarialbe#signal
should only remove the thread from the waitq if it's possible to acquire the lock. Otherwise it should be left in place, so that the order is retained, this should result in fair scheduling.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
I spent a bit of time looking into this issue. I think I understand why it's happening now, but I don't quite know what, if anything, we should do about it.
Firstly, I fixed up the reproduction issue in a couple of ways - after these changes, both Ruby implementations, as well as the pure-C++ reproduction, show the same issue. Fixes: https://github.com/KJTsanaktsidis/ruby-condition-variable-timeout
Then, I also patched the Ruby interpreter to add a few more USDT tracepoints to diagnose the issue: https://github.com/ruby/ruby/compare/master...KJTsanaktsidis:ruby:ktsanaktsidis/add_thread_probes
So, what's happening, in short, is a combination of a few things:
- If a single thread has a mutex, signals a condition variable, unlocks the mutex, and then locks it again, it's very likely that the thread will be successful in acquiring the mutex again the second time immediately. This is because whilst the condvar signal makes another thread eligible to run, the original thread still holds the GVL and will do so until it sleeps or does blocking I/O of some kind (or finishes its timeslice, but that's not relevant here).
- Ruby condition variables are fair (as a byproduct of Ruby mutexes being fair). We maintain a list of threads that are waiting to be signaled, and wake them up from oldest to newest.
- Ruby of course doesn't understand whether or not a "resource" is available when using a condition variable. It will wake up a thread when the condvar is signaled, but it's up the user code to decide if it can actually make forward progress at that point (i.e. there are spurious wakeups, from an application perspective, although not spurious wakeups from Ruby's point of view - something really did signal the condvar).
- Because of the combination of these factors, when you have one thread performing the actions from point 1, it's going to be doing a spurious wakeup on the condvar; whatever thread which wakes up won't be able to make progress because the first thread acquired the resource again.
- With the right combination of threads, concecutive release-acquire operations on a resource, and predictable delays, this can lead to starvation. In the reproduction example above, you have three threads and two consecutive release-acquire operations. That means that half of the condvar wakeups will be spurious; the wakeups which are immediately followed by the same thread re-acquiring the resource don't help maintain forward progress. However, it does put the spuriously-woken-up thread to the back of the waitq. The resource ends up ping-ponging between two of the threads whilst the third is eternally starved out.
6.I wrote up a detailed log of which thread does what in what order here: https://gist.github.com/KJTsanaktsidis/ddfbccc7b48d90277ec81ebf16591cb8
This is not a Ruby problem per-se; the C++ example in @ioquatix's repo exhibits the same issue (after I fixed it). I believe glibc's pthread condition variables have fair queues like in Ruby (although notably glibc's mutexes are not fair). Ruby perhaps makes the issue more likely because the GVL adds more determinsim though.
The best idea I have right now to "fix" this is to have ConditionVariable#wait
take an optional kwarg that lets the caller report whether their last wakeup was spurious or not; if set, we could prepend the caller to the waitq rather than appending them. It would be much better if we could solve this problem without adding any API surface though IMO.
I feel like this must be a problem with a lot of literature around it but perhaps I don't know the right words to look for.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
This is mostly just semantics, but I read some sources tonight - glibc doesn't actually do anything to implement "fair" condition variables. But, in the Linux kernel, the set of waiters waiting on a particular futex address is stored in a linked list. So a thread might be able to come along and "steal" a mutex (and hence the resource) if it's unlocked ahead of another thread that's been patiently waiting; this is the sense in which it's "unfair". But, if everybody's blocked waiting (i.e. doing FUTEX_WAIT
), and then the condvar is signaled (using FUTEX_WAKE
), which is almost certainly the case in all of these tests, then the thread which was waiting the longest will actually be the one woken, I think, just because of the linked list being traversed in order to do the wakeups.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
have ConditionVariable#wait take an optional kwarg that lets the caller report whether their last wakeup was spurious or no
A better API might be to have the ConditionVariable#wait
accept a block, and have Ruby do the looping & checking of the condition. Something like
@condvar.wait(mutex) do
@resource.available?
end
That's probably a nicer Ruby API anyway, and means it could transparently implement something like "append to the waitq on the first wait, and prepend thereafter".
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
- Tracker changed from Bug to Feature
- Backport deleted (
3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN)
Based on the thorough analysis by @kjtsanaktsidis (KJ Tsanaktsidis), this doesn't appear to be a bug in Ruby, but rather common behavior of condition variable across implementations. The idea of ConditionVariable#wait
taking a block to address the spurious wakeup issue seems interesting to me, but that would be a feature request.
Updated by ioquatix (Samuel Williams) over 1 year ago
the thread which was waiting the longest will actually be the one woken
To me, this is the correct behaviour and what I try to implement in Async. If there is a spurious wakeup, you don't loose your position in the queue.
I like the proposed implementation. Is there any chance it still has the similar problems? If we always reque at the front, could multiple waiters reque in front of each other? i.e. it still depends on the order.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
I like the proposed implementation. Is there any chance it still has the similar problems? If we always reque at the front, could multiple waiters reque in front of each other? i.e. it still depends on the order.
I think it can't have similar problems as long as it's woken up by #signal
- but if it's woken up by #broadcast
you probably don't want this prepending behaviour.
I think it's possible, with a bit of shuffling around in thread_sync.c
, for the implementation to not actually remove the thread off the ConditionVariable's waitq until it's actually successfully acquired the resource; that way, if e.g. #signal
is called twice, the two threads that are woken up will maintain their relative order in the waitq.
I also noticed last night that MonitorMixin::ConditionVariable
already has #wait_until
and #wait_for
methods that take a block. We could adjust the behaviour of these without any new API at all, which might be preferable? They're implemented by calling into ::ConditionVariable#wait
via rb_funcall
, but we could hack it to call some internal-only APIs to achieve different waitq behaviour if we wanted.
(Sidebar - when does one use Monitor
in ruby instead of Mutex
? Is it just that Monitor
is re-entrant and Mutex
is not?)
Finally - I did some reading, and another magic phrase I found about this problem is "handoff semantics". We could make it so that when calling ConditionVariable#signal
whilst holding the associated mutex, the signaled thread is guaranteed to acquire the mutex next after the current thread releases it; essentially, the mutex is "handed off". Because of the GVL, we would also need to "hand off" that off too and give it to the signaled thread. Essentially, when the signalling thread unlocks the mutex, it would need to stop and let the signaled thread run instead.
The textbook downside of this approach is that it leads to increased context switching; in a real multi-CPU language, there might be another thread already running which is about to try and get the mutex, and it's best for overall throughput if it's allowed to do so rather than stopping it and switching threads. In Ruby that applies too (if the signalling thread releases & immediately re-acquires the mutex, we can avoid a context switch). I also think there is a worse problem with this approach in Ruby - the thread that did the signaling has to sleep and do nothing after unlocking the mutex, whereas had it been allowed to run for a while, it might have started a blocking IO operation or called into a C extension or some such and yielded the GVL whilst also doing something productive. So, tl;dr, I don't think we should do this "handoff" approach.
Updated by ioquatix (Samuel Williams) over 1 year ago
I think it's possible, with a bit of shuffling around in thread_sync.c, for the implementation to not actually remove the thread off the ConditionVariable's waitq until it's actually successfully acquired the resource; that way, if e.g. #signal is called twice, the two threads that are woken up will maintain their relative order in the waitq.
Yes, I think that would be a good idea if we can do it.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
So, tl;dr, I don't think we should do this "handoff" approach.
After reading this I realised we don't actually have to hand off the GVL when the mutex is handed off. The thread that signalled the condition variable and unlocked the mutex could keep running, and the mutex could then be "reserved" for the thread that was woken up. If the unlocking thread then attempted to lock the mutex again, only then would it be put to sleep and a switch to the thread holding the "reservation" would happen. Yes, it's more context switches (only if the unlocking thread tries to lock again), but that probably doesn't matter so much on the scale of Ruby performance. The unlocking thread can continue to run after unlocking, though.
(Sorry for my real-time musing on the bugtracker)
Updated by ioquatix (Samuel Williams) over 1 year ago
I like this idea.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
I finished spiking out a bit of a POC of how this kind of handoff could work - https://github.com/ruby/ruby/pull/7977
I'm sure it is chock-a-block full of other fun bugs, but it seems to make your reproduction script work; it runs to completion in 10 seconds now instead of crashing after ~500ms.
WDYT? Is this an approach that's worth pursuing?
Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 1 year ago
@ioquatix (Samuel Williams) any thboughts? :)
Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 1 year ago
@ioquatix (Samuel Williams) do we want to do anything about this?
This really did stem from a real bug in somebody's code it seems (https://github.com/socketry/async/issues/99) so I guess it would be good to do something. The "something"s we cooked up so far seem to be...
- Add a ConditionVariable#wait_for method which takes a block and returns whether or not the wakeup was actually not spurious (from the application's perspective - i.e. can the thread make forward progress now, or does it need to wait on the condvar again?). Pros: admits a fairly simple implementation, and the API kind of feels more ruby-like anyway; Cons: requires code to actually use the new API
- Make Mutex#unlock cause the calling thread to give up the GVL and schedule the thread waiting for the mutex instead. Pros: sort of simple, solves the problem for users of the current API Cons: probably not great for performance (the calling thread might have been about to do some IO anyway, so if we'd just given it a bit more time it would be sleeping doing something productive (waiting for IO) instead of just being blocked on the GVL).
- Make Mutex be "reserved" for a thread if it's blocked on Mutex#lock, so that if the thread which currently has the mutex calls #unlock and then #lock again, it will find the mutex is reserved for someone else and transfer control. Pros: solves the problem for users of the current API; Cons: Pretty complex (https://github.com/ruby/ruby/pull/8331)
What is the next step for this? Should I put this on the agenda for a dev meeting to discuss? Something else?