Feature #19443
closedCache `Process.pid`
Description
It's not uncommon for database client and similar network libraries to protect themselves from Process.fork by regularly checking Process.pid
Until recently most libc would cache getpid()
so this was a cheap check to make.
However as of glibc version 2.25 the PID cache is removed and calls to getpid()
always invoke the actual system call which significantly degrades the performance of existing applications.
The reason glibc removed the cache is that some libraries were bypassing fork(2)
by issuing system calls themselves, causing stale cache issues.
That isn't a concern for Ruby as bypassing MRI's primitive for forking would render the VM unusable, so we can safely cache the PID.
An example of the issue: https://github.com/rails/rails/issues/47418
Updated by ko1 (Koichi Sasada) over 1 year ago
However as of glibc version 2.25 the PID cache is removed and calls to getpid() always invoke the actual system call which significantly degrades the performance of existing applications.
Could you show some benchmark results with/without your patch?
As I understand getpid()
system call is well tuned so I surprised that there is an impact on the app.
Updated by byroot (Jean Boussier) over 1 year ago
require 'benchmark/ips'
Benchmark.ips do |x|
x.report("Process.pid") { Process.pid }
end
On macOS where getpid()
is still cached:
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin22]
Warming up --------------------------------------
Process.pid 1.879M i/100ms
Calculating -------------------------------------
Process.pid 18.682M (± 2.1%) i/s - 93.968M in 5.032405s
On the same machine, but using the docker ruby:3.2
image (glibc based)
ruby 3.2.0 (2022-12-25 revision a528908271) [aarch64-linux]
Warming up --------------------------------------
Process.pid 356.920k i/100ms
Calculating -------------------------------------
Process.pid 3.539M (± 1.3%) i/s - 17.846M in 5.042975s
My branch on macOS:
ruby 3.3.0dev (2023-02-16T18:42:31Z cache-process-pid 0cd4797132) [arm64-darwin22]
Warming up --------------------------------------
Process.pid 1.804M i/100ms
Calculating -------------------------------------
Process.pid 18.465M (± 1.3%) i/s - 93.812M in 5.081288s
I'll try to build that branch in a docker container to benchmark it on glibc, but given the implementation I expect the same performance.
Updated by byroot (Jean Boussier) over 1 year ago
Here, from my branch built in a Ubuntu jammy (22.04) based image:
ruby 3.3.0dev (2023-02-16T18:42:31Z cache-process-pid 0cd4797132) [aarch64-linux]
Warming up --------------------------------------
Process.pid 1.848M i/100ms
Calculating -------------------------------------
Process.pid 18.561M (± 1.5%) i/s - 94.245M in 5.078802s
So it's a bit over 5x faster.
Updated by akr (Akira Tanaka) over 1 year ago
I think detecting fork using PID is not a good idea.
PID can conflict because PID is recycled.
We can define Process.fork_level as follows.
% ruby -e '
class << Process
attr_accessor :fork_level
end
Process.fork_level = 0
module ForkLevel
def _fork
pid = super
Process.fork_level += 1 if pid == 0
pid
end
end
class << Process; self end.prepend ForkLevel
puts "parent_fork_level: #{Process.fork_level}"
Process.wait(fork { puts "child_fork_level: #{Process.fork_level}" })
'
parent_fork_level: 0
child_fork_level: 1
fork can be detected by comparing the result of Process.fork_level.
This doesn't use PID (and getpid system call).
So, it has no overhead by getpid and no problem with PID recycling.
Updated by byroot (Jean Boussier) over 1 year ago
PID can conflict because PID is recycled.
I don't think it's a big concern for this use case, even with PID recycling, the PID of the child can't possibly be the same than the parent.
So unless you fork several time without ever triggering the check, you can't possibly be by this.
We can define Process.fork_level as follows.
Yes, on Ruby 3.1+ we can decorate Process._fork
for that purpose. I already submitted PRs to major libraries to do that when possible, however:
- It doesn't work for
Process.daemonize
(not a big deal, but still) - There is a long tail of existing code doing this, and fixing it all may take a very long time.
Also, regardless of what Process.pid
is used for, if we can make it 5x faster with extremely little code, and as far as I can tell no downsides, why shouldn't we?
Updated by byroot (Jean Boussier) over 1 year ago
I deployed a ruby shim of this cache to half of our servers: https://github.com/Shopify/pid_cache
Average latency: -2ms
Median latency: -2ms
p75 latency: -2ms
p99 latency: -10ms
p99.9: -30ms
Updated by ko1 (Koichi Sasada) over 1 year ago
Thank you.
How to read comment #7 results?
Updated by byroot (Jean Boussier) over 1 year ago
How to read comment #7 results?
It's a flat reduction on our latency (server response time) metrics.
On average, with the pid_cache shim, our server response time is 2 milliseconds faster.
Also to note, we're still seeing quite a lot of getpid()
syscalls coming from dependencies using $$
, and from some C extensions. So hopefully https://github.com/ruby/ruby/pull/7326 would be even more effective.
Updated by Anonymous over 1 year ago
"ko1 (Koichi Sasada) via ruby-core" ruby-core@ml.ruby-lang.org wrote:
As I understand
getpid()
system call is well tuned so I surprised that there is an impact on the app.
It's not whether or not a system call is expensive or not,
it's the fact a system call needs to be made at all.
With modern CPU vulnerability mitigations, all system calls got
more expensive. Perhaps Linux vDSO mechanism can be extended
to support getpid as it does gettimeofday/clock_gettime
Anyways, caching getpid() is much appreciated.
Updated by byroot (Jean Boussier) over 1 year ago
@dalehamel (Dale Hamel) noticed via tracing that we're also calling getpid()
quite a lot in the thread scheduler.
I think in that case we can simply use GET_VM()->fork_gen
, so I prepared a second patch for that https://github.com/ruby/ruby/pull/7434
Updated by ko1 (Koichi Sasada) over 1 year ago
byroot (Jean Boussier) wrote in #note-9:
How to read comment #7 results?
It's a flat reduction on our latency (server response time) metrics.
On average, with the pid_cache shim, our server response time is 2 milliseconds faster.
Thank you.
BTW it is easy to understand how 2ms has impact or not by showing the measured values of before/after.
Anyway, I agree that is valuable improvements.
Anonymous wrote in #note-10:
With modern CPU vulnerability mitigations, all system calls got
more expensive. Perhaps Linux vDSO mechanism can be extended
to support getpid as it does gettimeofday/clock_gettime
I see, especially on virtualization techniques.
Updated by byroot (Jean Boussier) over 1 year ago
it is easy to understand how 2ms has impact or not by showing the measured values of before/after.
Yes, unfortunately, being a public company, we have all these rules about material informations and such, so I wasn't sure what I could share exactly... But I realize it makes it harder to understand, sorry :/
Updated by byroot (Jean Boussier) over 1 year ago
Relaying here what Javier Honduvilla Coto said on one of the PRs:
wondering if it would be possible/make sense to override libc's getpid with a custom implementation that does the caching in there. That way not only Process.getpid would use the faster method but also any other part of the runtime such as what @dalehamel (Dale Hamel) mentioned above or any other getpid calls from native libraries?
I don't know if it's a good idea or not, as far as I know it would be a first for ruby to override a symbol defined by libc, it doesn't do so with malloc
and free
for instance.
Updated by matz (Yukihiro Matsumoto) over 1 year ago
Caching process id sounds OK for me. Go ahead.
Matz.
Updated by byroot (Jean Boussier) over 1 year ago
- Status changed from Open to Closed
Applied in changeset git|1db8951d3a8be6a756c9d3d3b87231997b301985.
Cache Process.pid
[Feature #19443]
It's not uncommon for database client and similar network libraries
to protect themselves from Process.fork by regularly checking Process.pid
Until recently most libc would cache getpid()
so this was a cheap
check to make.
However as of glibc version 2.25 the PID cache is removed and calls to
getpid()
always invoke the actual system call which significantly degrades
the performance of existing applications.
The reason glibc removed the cache is that some libraries were bypassing
fork(2)
by issuing system calls themselves, causing stale cache issues.
That isn't a concern for Ruby as bypassing MRI's primitive for forking
would render the VM unusable, so we can safely cache the PID.
Updated by byroot (Jean Boussier) over 1 year ago
Thank you Matz! I merged the caching of Process.pid
and $$
.
The thread scheduler still call getpid()
a lot, and I'll try to eliminate that in a follow-up (even though according to @ko1 (Koichi Sasada) most of that code will be replaced before 3.3).