Feature #17325
closedAdds Fiber#cancel, which forces a Fiber to break/return
Description
Calling Fiber#cancel
will force a fiber to return, skipping rescue and catch blocks but running all ensure blocks. It behaves as if a break
or return
were used to jump from the last suspension point to the top frame of the fiber. Control will be transferred to the canceled fiber so it can run its ensure blocks.
Propagation from resuming to resumed fibers¶
Any non-root living fiber can be canceled and cancellation will propagate to child (resumed) fibers. In this way, a suspended task can be canceled even if it is e.g. resuming into an enumerator, and the enumerator will be canceled as well. Transfer of control should match #17221's (much improved) transfer/resume semantics. After the cancellation propagates all the way to the bottom of the fiber resume stack, the last fiber in the chain will then be resumed. Resuming fibers will not run until they are yielded back into.
Suspension of canceled fibers¶
Canceled fibers can still transfer control with resume
, yield
, and transfer
, which may be necessary in order to release resources from ensure
blocks. For simplicity, subsequent cancels will behave similarly to calling break
or return
inside an ensure
block, and the last cancellation reason will overwrite earlier reasons.
Alternatives¶
Fiber#raise
could be used, but:
- Can only raise on resumable fibers.
- Cannot propagate cancellation down to resumed fibers.
- Exceptions are bigger and slower than
break
. -
#raise
can't (and shouldn't) be sent to resuming fibers. (It can't propagate.) - Exceptions can be caught. This might be desirable, but that should be at the discretion of the calling fiber.
Catch/Throw could be used (with an anonymous Object.new
), but:
- We would need to add
Fiber#throw
(or wrap/interceptFiber.yield
). - A hypothetical
Fiber#throw
should probably have similar semantics to#resume
and thus only be allowed on resumable fibers.- In that case, it wouldn't propagate down to resumed fibers.
-
catch
adds an extra stack frame.
We could use go-style "Context" objects that contain a "done?" queue/future.
- These would need to be explicitly passed around.
- Although their usage could be enforced via linters like rubocop, I think that placing it off to the side will give developers the impression that it is optional Some sort of cancel propagation mechanism is not optional for structured concurrency.
- It should built into any task-scheduler library, which would allow application code to use it explicitly.
- But this suffers the same problem as current Fiber wrappers: it works fine if your code uses the wrapper, but code that uses fibers without the wrapper can be incompatible and introduce bugs (e.g. fibers that are released without running their
ensure
blocks). - This make sense for a language like go which doesn't have exceptions but does have a convention of returning an "error" value. It feels out of place in ruby, IMO. Letting the fiber-task-scheduler mitigates that... for code that uses the fiber-task-scheduler.
We could add a keyword option to Fiber#raise
that gives it similar propagation semantics to this.
- IMO, the simplicity of
Fiber#raise
simply being a specialized version ofFiber#resume
is worth preserving. - The propagation changes alone are enough of a semantic difference to warrant a new method.
We could implement Fiber#cancel
by using fiber.raise(FiberCancellationError)
on the bottom fiber and catching that exception during termination of the canceled fiber.
- This would have the "benefit" that the exception could be rescued.
- I might be wrong, but I think that doing this would mostly duplicate my PR, but with some added complexity around exception construction and catching.
- It might be a good keyword option? e.g.
Fiber#cancel(with_exception: [true,#exception,#to_str])
Just let the task-fiber-scheduler library handle this.
- That's what I'm already doing now. It's mostly fine. It works in my code.
- Putting it into ruby core should lead to a small performance boost on very commonly repeated code.
- There's probably a better way to store the
cancel_reason
that doesn't require the overhead of adding anotherVALUE
torb_fiber_struct
. Maybe it can be placed directly intoerrinfo
?
- There's probably a better way to store the
- Although the common cases can be handled via a trampoline fiber or #17221, there can still be situations where your application's fiber-scheduler library might not know about fibers created by other libraries. This adds interoperability to a common scenario.
- Coroutine cancellation is IMO a core feature. It's important to have something like this for all applications and libraries to use as a baseline for interoperability.
Implementation: https://github.com/ruby/ruby/pull/3766
Updated by nevans (Nicholas Evans) about 4 years ago
This is my first big foray into the ruby C API, so I hope it makes sense what I was trying to do. Did I handle vm_fiber_break
and fiber_start
appropriately? Is there is somewhere more appropriate to save the cancel reason than a new VALUE
on rb_fiber_struct
? E.g. could I safely assign the canceled fiber's ec->errinfo
from the calling Fiber, instead of using cancel_reason
?
When this compiles, it warns about:
vm_insnhelper.h:251:1: warning: ‘vm_call_iseq_optimizable_p’ defined but not used [-Wunused-function]
251 | vm_call_iseq_optimizable_p(const struct rb_callinfo *ci, const struct rb_callcache *cc)
So I'm probably doing something wrong there, too.
Updated by nevans (Nicholas Evans) about 4 years ago
I also created a github PR at https://github.com/ruby/ruby/pull/3766.
Updated by nevans (Nicholas Evans) about 4 years ago
- File deleted (
0001-Added-Fiber-cancel-which-forces-a-Fiber-to-return.patch)
Updated by nevans (Nicholas Evans) about 4 years ago
- Description updated (diff)
updated description and github PR
Updated by nevans (Nicholas Evans) about 4 years ago
- Description updated (diff)
I've updated the description to 1) better describe the motivation and 2) match the PR's current behavior when a Fiber is canceled multiple times.
Updated by Eregon (Benoit Daloze) about 4 years ago
This sounds like Thread#kill but for Fiber.
However, Fiber is cooperative so it seems this shouldn't be needed.
What's a concrete issue with Fiber#raise? It seems redundant with it to me.
If the application swallows Exception I think that's the application problem, and I think we should not add a second way to unwind a Fiber just for that.
The performance difference is likely not very relevant, as I expect exiting a Fiber is not a common operation, and most of the cost is likely unwinding the stack & running ensure
.
Because this runs ensure
, it's still possible for that Fiber to hang, or to raise another exception in ensure
which can then be caught.
So, it's not a guarantee either.
fiber.raise(Exception)
seems enough, and this seems to be only marginally different, at the cost of another method and yet another way to terminate a Fiber.
Regarding propagation, I'm not sure what's the difference, but exceptions in Fibers are propagated:
$ ruby -e 'f=Fiber.new { raise "foo" }; p((f.resume rescue $!))'
#<RuntimeError: foo>
Updated by nevans (Nicholas Evans) about 4 years ago
Thanks for taking a look at this, Benoit. I agree it's not obvious why this is necessary with Fiber#raise, so I'll try to explain my reasoning in more detail:
Yes, a library (e.g. async
) could write a "suspend" function that wraps resume, yield, and transfer. That library could implement cancel (and raise) by tunneling resume types & values through a struct. That library can't directly control fibers that are created outside its wrapper (e.g. Enumerator
), but our "suspend" wrapper could implement a "trampoline" fiber to interoperate with most simple cases like Enumerable
. I know all of this is possible, because I've been using a suspend wrapper like this on one of my work projects for a few years now! :) Still, it's not hard to imagine incompatible fiber-creating libraries that circumvent our wrapper in unpredictable ways. Putting these into Fiber
gives all fiber-using applications and libraries a shared baseline they can rely upon.
I disagree on "not a common operation". I think that relatively short-lived fibers could become as common-place as relatively short-lived goroutines
are in go
(and I hope they do). And I think non-exceptional cancellation is a very important concept for asynchronous code to handle explicitly. Other languages with coroutines special case "cancel" (even if some of them tunnel cancelation via error types). E.g. go's Context
type uses Done()
and the go vet tool checks that CancelFuncs
are used on all control-flow paths. Kotlin coroutine's Job
class has cancel methods but not raise methods.
Propagation¶
The most important difference from Fiber#raise
isn't that it is unrescuable or uses fewer resources, but how propagation works:
- You can cancel any living fiber except the root-fiber.
- I've disallowed canceling the root to avoid accidentally shutting the thread down.
- I'm currently raising
FiberError
for terminated fibers, but I think I'd like to change that. I think canceling dead fibers should simply returnnil
orfalse
. That way we can safely call cancel on any fiber, and the only reason it would raise an exception is if the child fiber terminates with an exception during cancellation.
- Cancel propagates down through the linked list of resumed fibers.
- Execution of
ensure
still runs from bottom up. - It does not propagate an error or cancel up from the canceled fiber.
E.g. if an async task fiber is resuming into an lazy enumerator which is resuming into another fiber (and so on) and your thread scheduler wants to cancel your task's fiber but doesn't know about those other fibers, Fiber#raise
won't work. If your fiber is transferring, Fiber#raise
won't work. Fiber#cancel
will work on any living fiber while still following the rules (#17221) for transferring control.
Just as break
doesn't generally need to know or care about the implementation of intervening library code, canceling a fiber shouldn't need to know or care about what the implementation of any sub-fibers may have been resumed.
semantics of raise/rescue vs unexceptional break/return¶
I'm not against temporarily or explicitly blocking cancellation. Avoiding swallowed exceptions is not the most important feature, but it's still a useful one I think. And any small performance gain would be desirable, but not the primary driver. Aside from either of those, raise
has different semantics from break
or return
(or throw
). (As currently written) this is only for semantically unexceptional flow-control. And this isn't a matter of "application code can handle it" because applications can't control their intervening library code, nor can they control fibers created by intervening library code. It's quite common to see code like the following:
def foo
some_library_code_runs_this_block_many_stack_layers_deep do
result = etc_etc_etc
return result.retval if result.finished?
end
end
We could wrap this in an exception handler, but it would be more confusing to the casual reader than simply using return
or break
(or maybe catch
and throw
). For jumping to the ensure
block of a particular method we use return
. For block scope, break
. For fiber-scope: Fiber#cancel
.
I expect that return
statement to non-exceptionally rewind the stack without being caught by any catch
or rescue
. I don't want a library's hidden rescue Exception
to subvert my break
or return
(libraries shouldn't do this, but sometimes they do).
It's not as simple as an "application problem". Task cancellation could be triggered by application code or by library code. A task-scheduler library might call Fiber#cancel
, and the fibers being canceled might be in application code or in library code or might be suspended by resuming into fibers that are uncontrolled by or even unknown to task-scheduler. None of that should matter.
Wrapping a task with catch {|tag| ... }
would be conceptually better than exception handling... but throw tag
from an Enumerator
won't propagate up to the return fiber. (I don't want to change this behavior.)
ruby -e 'f = Fiber.new { throw :foo }; p((catch(:foo) { f.resume } rescue nil))'
#<UncaughtThrowError: uncaught throw :foo>
Examples¶
To be clear, these are toy examples and I'd want most of the following to be handled for me by a fiber-task-scheduler library (like async
). But that library itself should have a mechanism for canceling resuming tasks, even when it doesn't (or can't) know about the resumed child fibers of those tasks. Fiber#raise
(as currently written) can't do that.
def run_server
server = MyFiberyTCPServer.open
# Do stuff. e.g. accept clients, assign connections to fibers, etc.
# Those connections can create their own sub-fibers.
# The server may know nothing about those sub-fibers. It shouldn't need to.
# Those subfibers might even use an entirely different scheduler. That's okay.
# Connection fibers might be un-resumable because they are resuming. No prob.
wait_for_shutdown_signal # => transfers to some sort of fiber scheduler
ensure
# cancels all connection-handler fibers
server.connections.each do |c|
# Are those connection fibers resuming other sub-fibers tasks?
# Do we even know about those sub-tasks?
# Can we even know about them from here?
# Who cares? Those need to be canceled too!
c.cancel :closing if c.alive?
# I'd like to make dead_fiber.cancel unexceptional too
end
end
# fetching a resource may depend on fetching several other resources first
def resource_client
a = schedule_future { http.get("/a") }
b = schedule_future { http.get("/b") }
items = a.value.item_ids.map {|id| http.get("/items/#{id}") }
combine_results(b, ary)
ensure
# if any of the above raises an exception
# or if *this* fiber is canceled
# of if combine_results completed successfully before all subtasks complete
a&.cancel rescue nil # is it resuming another fiber? don't know, don't care.
b&.cancel rescue nil # is it resuming another fiber? don't know, don't care.
ary&.each do |item| item.cancel rescue nil end # ditto
end
# yes, task library code would normally provide a better pattern for this
def with_timeout(seconds)
timer = Task.schedule do
sleep seconds
ensure
task.cancel :timeout
end
task = Task.schedule do
yield # does this resume into sub-tasks? we shouldn't need to know.
ensure
timer.cancel
end
task.value
end
No guarantees¶
And yes, we can always have misbehaving code and I'm not trying to guarantee against every case. We can't guard against certain categories of bugs nor infinite loops. It's always possible someone's written:
def foo
bar
ensure
while true
begin
while true
Fiber.yield :misbehaving
end
rescue Exception
# evil code being evil
end
end
end
But that's entirely outside the scope of this. :)
We can have bugs here just like any code can have bugs. But in my experience, ensure
code is usually much shorter and simpler than other code. Shut down, clean up, release, and reset.
Updated by nevans (Nicholas Evans) about 4 years ago
- Description updated (diff)
Added some other alternatives to the description.
Updated by nevans (Nicholas Evans) about 4 years ago
- Description updated (diff)
added more to alternatives
Updated by nevans (Nicholas Evans) about 4 years ago
- Description updated (diff)
one more update to alternatives
Updated by nevans (Nicholas Evans) about 4 years ago
Like I wrote above, I think I'd like to change my PR so that terminated_fiber.cancel
returns nil
or false
instead of raising FiberError
. That way any fiber on the current thread, except for the root fiber, can be canceled without exceptions (unless an exception is raised during cancellation).
Also, this PR as written is for non-exceptional cancellation (doesn't raise in return_fiber
) that are not catchable (like break
). It might be worthwhile to add keywords options for all four possibilities:
use break or raise to cancel | on completion, raise error in return_fiber | method call | |
---|---|---|---|
break (unrescuable) | unexceptional | fiber.cancel |
this PR. the common case |
break (unrescuable) | exceptional | fiber.cancel(raise: err) |
exceptional cancellation, e.g. most timeouts |
raise (rescuable) | unexceptional | fiber.cancel(with_exception: err) |
preventable cancellation |
raise (rescuable) | exceptional | fiber.cancel(with_exception: err, raise: err) |
Fiber#raise with "resuming" propagation |
I think the first one is most useful, but the others can be useful too. The others could be in future PRs... if this (or something like it) is accepted. But if people don't agree that this version is useful, I wasn't going to implement the others. :)
Updated by nevans (Nicholas Evans) about 4 years ago
n.b. I consider this to be related, at least in spirit, to https://bugs.ruby-lang.org/issues/595. I consider this analogous to IO#open and IO#close. We ought to have a structured/standard way to "close" fibers that have been "opened". (This analogy with IO#close
is other good reason to return nil
for dead fibers.)
Updated by Eregon (Benoit Daloze) about 4 years ago
Could you give an example code where Fiber#raise wouldn't work?
Is it just about #17331 or there are more issues with Fiber#raise?
I don't quite understand why you need to cancel other Fibers, they are not executing anyway.
Is it to make sure to run their ensure
clauses?
I guess Fiber#cancel
actually waits for that Fiber to terminate (otherwise it would run Fibers concurrently, which is of course bad).
cancel
doesn't sound to me like it would wait from the name.
Maybe terminate
or so is better for that.
Fiber#kill
might be clearer about the intent, there is no magic "cancellation", it's just the equivalent of an exception.
For the run_server
example, you could wrap it in a thread, then when the main Fiber of that Thread returns, it would already automatically kill all Fibers of the Thread. Or if the shutdown ends up terminating the main thread, then it would be enough too.
On CRuby, those cases don't seem to run ensure
, but that seems a bug and TruffleRuby already runs ensure
in that case:
ruby -e 'Thread.new { Fiber.new { begin; p 1; Fiber.yield; ensure; p 2; end }.resume }.join'
Updated by Eregon (Benoit Daloze) about 4 years ago
- Related to Bug #17331: Let Fiber#raise work with transferring fibers added
Updated by nevans (Nicholas Evans) almost 2 years ago
This should be closed, in favor of a (TBD) structured concurrency system, built on Fiber.current.storage
(#19078) and/or FiberScheduler. :)
Updated by ioquatix (Samuel Williams) almost 2 years ago
I'm not against this feature, but I'm not sure if it has a valid use case.
I like the idea of using break as a semantic for making a fiber exit in a similar way to a method call, etc.
Fiber#raise
is currently used for cancellation like semantics, but it uses an exception. Maybe break
is better, but efficiently is not a strong argument here, at least, in the case where terminating a fiber is a rare occurrence.
But using a Stop
exception is probably not as good as a clean semantic for termination as a break mechanic.
Updated by nevans (Nicholas Evans) almost 2 years ago
oh, I still very much want the "break"/"return" semantic, and I still think it should be invoked via Fiber#cancel (not a special exception class with Fiber#raise).
But I think the semantics and implementation should be slightly different from this proposal (I don't have time to put together a new proposal before 3.2).
So I closed the associated PR. But we could leave the ticket open, at least until it is superceded by a better proposal.
Updated by ioquatix (Samuel Williams) over 1 year ago
- Status changed from Open to Closed
We introduced Fiber#kill
which behaves exactly as requested by the proposal.
Calling Fiber#cancel will force a fiber to return, skipping rescue and catch blocks but running all ensure blocks. It behaves as if a break or return were used to jump from the last suspension point to the top frame of the fiber. Control will be transferred to the canceled fiber so it can run its ensure blocks.