Feature #19842
Updated by Eregon (Benoit Daloze) about 1 year ago
This ticket proposes to introduce M:N threads to improve Threads/Ractors performance.
## Background
Ruby threads (RT in short) are implemented from old Ruby versions and they have the following features:
* Can be created with simple notation `Thread.new{}`
* Can be switched to another ready Ruby thread by:
* Time-slice.
* I/O blocking.
* Synchronization such as Mutex features.
* And other blocking reasons.
* Can be interruptible by:
* OS-deliver signals (only for the main thread).
* `Thread#kill`.
* `Thread#raise`.
* Can be terminated by:
* the end of each Ruby thread.
* the end of the main thread (and other Ruby threads are killed).
Ruby 1.8 and erlier versions uses M:1 threads (green threads, user level threads, .... the word 1:N threads is more popular but to make this explanation consistent I use "M:1" term here) which manages multiple Ruby threads on 1 native thread.
(Native threads are provided by C interfaces such as Pthreads. In many cases, native threads are OS threads, but there are also user-level implementations, such as user-level pthread libraries in theory. Therefore, they are referred to as native threads in this article and NT in short)
If a Ruby thread RT1 blocked because of a I/O operation, Ruby interpreter switches to the next ready Ruby thread RT2. The I/O operation will be monitors by a `select()` (or similar) functionality and if the I/O is ready, RT1 is marked as a ready thread and RT1 will be resumed soon. However, when a Ruby thread issues some other blocking operations such as `gethostbyname()`, Ruby interpreter can not swtich to any other Ruby thread while `gethostbyname()` is not finished.
We named two types blocking operations:
* Managed blocking operations
* I/O (most of read/write)
* manage by I/O multiplexing API (select, poll, epoll, kqueue, IOCP, io_uring, ...)
* Sleeping
* Synchronization (Mutex, Queue, ...)
* Unmanaged operations
* All other blocking operations not listed above, written in C
* Huge number calculation like `Bignum#*`
* DNS lookup
* I/O (can not detect block-able or not by multiplexing API)
* open on FIFO, close on NFS, ...
* flock and other locking mechanism
* library call which uses blocking operations
* `libfoo` has `foo_func()` and `foo_func()` waits DNS lookup. A Ruby extension `foo-ruby` can call `foo_func()`.
With these terms we can say that M:1 threads can suport managed blocking operations but can not support unmanaged operations (can not make progress other Ruby threads) without further tricks.
Note that if the `select()`-like system calls say a `fd` is ready, but the I/O opeartion for `fd` can be blocked because of some contention (read by another thread or process, for example).
M:1 threads has another disadvantage that it can not run in parallel because only a native thread is used.
From Ruby 1.9 we had implemented 1:1 thread which means a Ruby thread has a corresponding native thread. To make implementation easy we also introduced a GVL. Only a Ruby thread acquires GVL can run. With 1:1 model, we can support managed blocking oprations and unmanaged blocking operations by releasing GVL. When a Ruby thread want to issue a blocking operation, the Ruby thread releases GVL and another ready Ruby threads continue to run. We don't care the blocking operation is managed or unmanaged.
(We can not make some of unmanaged blocking operations interruptible (stop by Ctrl-C for example)).
Advantages of 1:1 threads to the M:1 threads is:
* Easy to handle blocking operations by releasing GVL.
* We can utilize parallelism with multiple native threads by releasing GVL.
Disadvantages of 1:1 threads to the M:1 threads is:
* Overhead to make many native threads for many Ruby threads
* We can not make huge number of Ruby threads and Ractors on 1:1 threads.
* Thread switching overhead by GVL because inter-core communication is needed.
From Ruby 3.0 we introduced fiber scheduler mechanism to maintain multiple fibers
Differences between Ruby 1.8 M:1 threads are:
* No timeslice (only switch fibers by managed blocking operations)
* Ruby users can make own schedulers for apps with favorite underlying mechanism
Disadvantages are similar to M:1 threads. Another disadvantages is we need to consider about Fiber's behavior.
From Ruby 3.0 we also introduced Ractors. Ractors can run in parallel because of separating most of objects. 1 Ractor creates 1 Ruby thread, so Ractors has same disadvantages of 1:1 threads. For example, we can not make huge number of Ractors.
## Goal
Our goal is making lightweight Ractors on lightweight Ruby threads. To enable this goal we propose to implement M:N threads on MRI.
M:N threads manages M Ruby threads on N native threads, with limited N (~= CPU core numbers for example).
Advantages of M:N threads are:
1. We can run M ractors on N native threads simultaneously if the machine has N cores.
2. We can make huge number of Ruby threads or Ractors because we don't need huge number of native threads
3. We can support unmanaged blocking operations by locking a native thread to a Ruby thread which issues an unmanaged blocking operation.
4. We can make our own Ruby threads or Ractors scheduler instead of the native thread (OS) scheduler.
Disadvantages of M:N threads are:
1. It is complicated implmentation and it can be hard.
2. It can introduce incompatibility especaially on TLS (Thread local storage).
3. We need to maitain our own scheduler.
Without using multiple Ractors, it is similar to Ruby 1.8 M:1 threads. The difference with M:1 threads are locking NT mechanism to support unmanaged blocking operations. Another advantage is that it is easy to fallback to 1:1 threads by locking all of corresponding native threads to Ruby threads.
## Proposed design
### User facing changes
If a program only has a main Ractor (i.e., most Ruby programs), the user will not face any changes by default.
On main Ractor, all threads are 1:1 threads by default and there is no compatibility issue.
`RUBY_MN_THREADS=1` envrionment variable is given, main Ractor enables M:N threads.
Note that the main thread locks NT by default because the initial NT is special in some case. I'm not sure we can relax this limitation.
On the multiple Ractors, N (+ alpha) native threads run M ractors. Now there is no way to disable M:N threads on multiple Ractors because there are only a few multi-Ractor programs and no compatibility issues.
Maximum number of N can be specified by `RUBY_MAX_PROC=N`. 8 by default but this value should be specified with the number of CPU processors (cores).
### TLS issue
On M:N threads a Ruby thread (RT1) migrates from a native thread (NT1) to NT2, ... so that TLS on native code can be a problem.
For example, RT1 calls a library function `foo()` and it set TLS1 on NT1. After migrating RT1 to NT2, RT1 calls `foo()` again but there is no TLS1 record because TLS1 is recorded only on NT1.
On this case, RT1 should be run on NT1 while using native library foo. To avoid such prbolem, we need the following features:
* 1:1 threads on main Ractor by default
* functionality to lock the NT for RT, maybe `Thread#lock_native_thread` and `Thread#unlock_native_thread` API is needed. For example, Go language has `runtime.LockOSThread()` and `runtime.UnlockOSThread()` for this purpose.
* Or C-API only for this purpose? (not fixed yet)
Thankfully, the same problem can occur with Fiber scheduler (and of course Ruby 1.8 M:1 threads), but I have not heard of it being much of a problem, so I expect that TLS will not be much of an issue.
### Unmanaged blocking operations
From Ruby 1.9 (1:1 threads), the `nogvl(func)` API is used for most blocking operations to keep the threading system healthy. In other words, `nogvl(func)` represents that the given function is blocking operation. To support unmanaged blocking operations, we lock a native thread for the Ruby thread which issues blocking operation.
If the blocking operations doesn't finish soon, other Ruby threads can not run because a RT locks NT. In this case, another system monitoring thread named "Timer thread" (historical name and TT in short) creates another NT to run ready other Ruby threads.
This TT's behavior is the same as the behavior of "sysmon" in the Go language.
We named locked NT as dedicated native threads (DNT) and other NT as shared native threads (SNT). The upper bound by `RUBY_MAX_PROC` affects the number of SNT. In other words, the number of DNT is not limited (it is same that the number of NT on 1:1 threads are not limited).
### Managed blocking operations
Managed blocking operations are multiplexing by `select()`-like functions on the Timer thread.. Now only `epoll()` is supported.
I/O operation flow (read on fd1) on Ruby thread RT1:
1. check the ready-ness of fd1 by `poll(timeout = 0)`, goto step 4.
2. register fd1 to Timer thread (TT) epoll and resume another ready Ruby thread.
3. If TT detects that the fd1 is ready, make RT1 as ready thread.
4. When RT1 is resumed, then do `read()` by locking corresponding NT1.
`sleep(n)` operation flow on Ruby thread RT1:
1. register timeout of RT1 to TT epoll.
2. If TT detects the timeout of RT1 (n seconds), TT makes RT1 as a ready Ruby thread.
### Internal design
* 2 level scheduling
* Ruby threads of a Ractor is managed by M:1 threads
* Ruby threads of different Ractors are managed by M:N threads
* Timer thread has several duties
1. Monitoring I/O (or other event) ready-ness
2. Monitoring timeout
3. Produce timeslice signals
4. Help OS signal delivering
(On pthread environment) recent Ruby doesn't make timer thread but MaNy implementation makes TT anytime. it can be improved.
## Implementation
The code name is MaNy project, it is from MN threads.
https://github.com/ko1/ruby/tree/many2
The implementation is not matured (debugging now).
## Measurements
See RubyKaigi 2023 slides: https://atdot.net/~ko1/activities/2023_rubykaigi2023.pdf
## Discussion
* Enable/disable
* default behavior
* how to switch the behavior
* Should we lock the NT for main thread anytime?
* Ruby/C API to lock the native threads
## Misc
This description will be improved more later.