Feature #20057: Change behaviour of rb_register_postponed_job for Ruby 3.3 - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #20057

closed

Change behaviour of rb_register_postponed_job for Ruby 3.3

Feature #20057: Change behaviour of rb_register_postponed_job for Ruby 3.3

Added by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago. Updated about 2 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:115684]

Description

This ticket is to discuss some changes to rb_register_postponed_job that @ko1 (Koichi Sasada) and myself propose to make for Ruby 3.3. The motivation for this work is to fix a bug in the current implementation, which can cause the registered functions to be called with the wrong data argument (https://bugs.ruby-lang.org/issues/19991).

There's a long discussion on the associated PR (https://github.com/ruby/ruby/pull/8949) but in the end we came to the conclusion that the best way to fix this bug involved actually changing the current semantics of rb_register_postponed_job. I'm opening this issue to get feedback on this approach and to see if anybody knows of a reason why we should not release this for Ruby 3.3.

Current behaviour in Ruby 3.2¶

Currently, Ruby has two functions for interacting with postponed jobs. These jobs can be enqueued from anywhere (including signal handlers), and will be executed next time Ruby checks for RUBY_VM_CHECK_INTS().

rb_postponed_job_register(func, data): Schedules func(data) to be executed the next time RUBY_VM_CHECK_INTS is checked.
rb_postponed_job_register_once(func, data): Works like rb_postponed_job_register, except if func is already scheduled to be executed (either with this data or with different data), in which case it does nothing.

The postponed jobs are stored in a fixed sized array (of length 1024), so it's possible that enqueuing them could fail if the buffer is full. In this case, they signal this by returning 0 (otherwise, they return 1 for successful enqueue or 2 because rb_postponed_job_register_once did nothing because func was already in the queue).

Unfortunately, as I mentioned before, the implementation of these functions are subject to a race condition because func and data are not written into the postponed job buffer together atomically (they are two separate variables and CPUs tend not to have double-word atomic instructions). Again, see https://bugs.ruby-lang.org/issues/19991 for the full details.

What we have done¶

Whilst working on this issue, we had a look at all of the in-the-wild usages of these APIs on rubygems. The only real usage of these APIs is for profiling tools, and the following was true for essentially all of them:

Each gem only is registering a single callback function,
Almost all of the usages either make no use of the data argument at all, or pass some kind of never-changing global context into it.
There are only a very small handful of gems using these APIs at all

Thus, we concluded that the current behaviour of allowing scheduling and execution of arbitrary (func, data) pairs is actually not really needed. Instead, we could offer a more limited API which would meet the needs of all current users, whilst making it easy to avoid the race conditions in the current implementation.

The new API is as follows:

rb_postponed_job_preregister(func, data): This function registers func/data into a small, fixed-size table, and return a handle to this registration. Subsequent calls to this function with the same func will return the same handle, and overwrite the data with new data if it is different. The size of the table is 32 entries on most systems, which is still enough to use literally every gem on rubygems that actually uses these APIs at the same time. The intention is that libraries would call this function in their initialization routines, storing the handle for later.
rb_postponed_job_trigger(handle): This function takes the handle from rb_postponed_job_preregister and schedules it for execution the next time RUBY_VM_CHECK_INTS is called. If the handle is already scheduled, this will not cause it to be scheduled twice; each func can only be called a maximum of one time for each call to RUBY_VM_CHECK_INTS, essentially.

All of the usages of the old rb_postponed_job_register{,_once} functions in the Ruby tree have been replaced by calls to the above two functions, and these two old functions have been marked with the deprecated attribute. They have also been re-implemented in terms of the new functions; both rb_postponed_job_register and rb_postponed_job_register_once are now both equivalent to rb_postponed_job_trigger(rb_postponed_job_prereigster(func, data)). This means that:

rb_postponed_job_register now works like rb_postponed_job_register_once i.e. func can only be executed one time per RUBY_VM_CHECK_INTS, no matter how many times it is registered
They are also called with the last data to be registered, not the first (which is how rb_postponed_job_register_once previously worked)

I verified that stackprof still builds & works correctly with the new implementation of rb_postponed_job_register.

What else we tried¶

I tried a couple of things to keep the current semantics of rb_postponed_job_register{,_once} intact, without introducing new APIs.

First, I tried protecting postponed job buffer by masking signals around the critical section & using a POSIX semaphore instead of a pthread mutex: https://github.com/ruby/ruby/pull/8856. However, there was a concern that this would be too slow (since RUBY_VM_CHECK_INTS is called very often, and both the semaphore and the signal mask require calling into the kernel).
Then, I implemented a lock-free ringbuffer to store the postponed job queue: https://github.com/ruby/ruby/compare/master...KJTsanaktsidis:ruby:old_circular_ringbuffer. However, the concern with this implementation was that it was too complex.

Ruby 3.3¶

As of right now, we have merged these changes (from https://github.com/ruby/ruby/pull/8949), and @ko1 (Koichi Sasada) plans for them to go out in 3.3-rc1. The point of opening this issue is to ask: does anybody foresee any problem with our approach?

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#1 [ruby-core:115685]

Description updated (diff)

Sorry, I linked the wrong PR - the correct one is https://github.com/ruby/ruby/pull/8949. Updated the description.

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#2 [ruby-core:115692]

Is it necessary to change the data for an existing callback?
Alternatives I can think of:

Return error code if the callback is already registered.
Allow registering multiple times the same callback.

From reading the description of this issue it's not clear to me why the 2 functions are safer or why we would need to deprecate the older API, given the older API just calls the new one.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#3 [ruby-core:115709]

Is it necessary to change the data for an existing callback?

There was, afaict, at least one gem (gctools) which did actually want to call the same function with a different callback at various points in the program. A previous version of the patchset I proposed did actually "allow registering the callback multiple times" as you suggested, but repeated registration of the same callback with different data would fill the postponed job table, and necessitate some kind of cleanup, which brings back the original race condition. Thus, we landed on this solution instead.

From reading the description of this issue it's not clear to me why the 2 functions are safer or why we would need to deprecate the older API, given the older API just calls the new one.

From a strictly functional perspective, we don't need the newer API, we could just only make the change to the current APIs. But, the old APIs kind of imply from their signature that you could call them with arbitrary func and data as much as you like, whilst the new APIs encourage the pattern of registering a small number of func's during program initialization (where an error could actually conceivably be handled, too). I don't think there's a compelling argument to actually delete the older APIs though, I more marked them as deprecated to nudge people towards using the correct pattern.

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#4 [ruby-core:115723]

There was, afaict, at least one gem (gctools) which did actually want to call the same function with a different callback at various points in the program. A previous version of the patchset I proposed did actually "allow registering the callback multiple times" as you suggested, but repeated registration of the same callback with different data would fill the postponed job table, and necessitate some kind of cleanup, which brings back the original race condition. Thus, we landed on this solution instead.

Could changing data be something handled by the caller, e.g., instead of passing data, pass &data and let the caller change it when it wants?
But I guess that would be thread unsafe so probably not a good solution.

From a strictly functional perspective, we don't need the newer API, we could just only make the change to the current APIs. But, the old APIs kind of imply from their signature that you could call them with arbitrary func and data as much as you like, whilst the new APIs encourage the pattern of registering a small number of func's during program initialization (where an error could actually conceivably be handled, too). I don't think there's a compelling argument to actually delete the older APIs though, I more marked them as deprecated to nudge people towards using the correct pattern.

Thanks for the clarification.

I think the proposed change is good.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#5

Status changed from Open to Closed

Applied in changeset git|a2994c300b9620b0b226c8373f15627eead65d43.

Add changelog entry for [Feature #20057]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#6 [ruby-core:115725]

Status changed from Closed to Open

Sorry, didn't mean to close this.

Updated by ko1 (Koichi Sasada) over 2 years ago Actions
Copy link
#7 [ruby-core:115754]

rb_postponed_job_preregister(), rb_postponed_job_register/_once() run O(n) where n is registered jobs.
rb_postponed_job_trigger() is O(1).
We believe n is enough small though...

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago Actions
Copy link
#8 [ruby-core:116143]

Status changed from Open to Closed

This was merged for Ruby 3.3 - f8effa209adb3ce050c100ffaffe6f3cc1508185

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #20057

Change behaviour of rb_register_postponed_job for Ruby 3.3

Current behaviour in Ruby 3.2¶

What we have done¶

What else we tried¶

Ruby 3.3¶

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#1 [ruby-core:115685]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#2 [ruby-core:115692]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#3 [ruby-core:115709]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#4 [ruby-core:115723]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#5

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#6 [ruby-core:115725]

Updated by ko1 (Koichi Sasada) over 2 years ago Actions
Copy link
#7 [ruby-core:115754]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago Actions
Copy link
#8 [ruby-core:116143]

Project

General

Profile

Ruby

Custom queries

Feature #20057

Change behaviour of rb_register_postponed_job for Ruby 3.3

Current behaviour in Ruby 3.2¶

What we have done¶

What else we tried¶

Ruby 3.3¶

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago ActionsCopy link #1 [ruby-core:115685]

Updated by Eregon (Benoit Daloze) over 2 years ago ActionsCopy link #2 [ruby-core:115692]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago ActionsCopy link #3 [ruby-core:115709]

Updated by Eregon (Benoit Daloze) over 2 years ago ActionsCopy link #4 [ruby-core:115723]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago ActionsCopy link #5

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago ActionsCopy link #6 [ruby-core:115725]

Updated by ko1 (Koichi Sasada) over 2 years ago ActionsCopy link #7 [ruby-core:115754]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago ActionsCopy link #8 [ruby-core:116143]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#1 [ruby-core:115685]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#2 [ruby-core:115692]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#3 [ruby-core:115709]

Updated by Eregon (Benoit Daloze) over 2 years ago Actions
Copy link
#4 [ruby-core:115723]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#5

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#6 [ruby-core:115725]

Updated by ko1 (Koichi Sasada) over 2 years ago Actions
Copy link
#7 [ruby-core:115754]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 years ago Actions
Copy link
#8 [ruby-core:116143]