https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112011-03-31T15:37:53ZRuby Issue Tracking SystemRuby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=162542011-03-31T15:37:53Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
Is it possible to interrupt/wakeup a thread that's doing a direct blocking IO call? I always understood that as the primary reason for doing the select logic.<br>
=end</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=162602011-04-01T07:29:39Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>=begin<br>
Charles Nutter <a href="mailto:headius@headius.com" class="email">headius@headius.com</a> wrote:</p>
<blockquote>
<p>Is it possible to interrupt/wakeup a thread that's doing a direct<br>
blocking IO call? I always understood that as the primary reason for<br>
doing the select logic.</p>
</blockquote>
<p>Yes for non-regular files as long as the signal handlers don't set the<br>
SA_RESTART flag. Ruby does not set SA_RESTART anywhere and can<br>
interrupt I/O on pipes/sockets at any time.</p>
<p>Regular files are special. select() just returns success immediately on<br>
regular files and the IO operation will block (refusing to accept<br>
signals) while waiting for disk. NFS can be mounted to be<br>
interruptable, but you still can't rely on select()/poll() for readiness<br>
notification.</p>
<p>--<br>
Eric Wong<br>
=end</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=162652011-04-01T19:01:58Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
Understood. I read through a bit more code and saw that Ruby uses pthread signalling and RUBY_VM_CHECK_INTS after blocking regions to interrupt blocking operations.</p>
<p>I wonder, though, if depending on this behavior is leading Ruby more and more down the GVL path. The designers of the JVM's core IO libraries, for example, were unable to reconcile concurrent native threads with interruptible IO, due to the impossibility of knowing what state all IO-related data structures are in when the thread is interrupted. As a result, IO channels performing blocking operations are explicitly closed when the thread they block is interrupted.</p>
<p>In JRuby, we simulate interruptible IO by using select as much as possible, and blocking operations against unselectable IO channels are not interruptible. Some of the overhead of select is mitigated by JVM implementers usually using the fastest-possible mechanism to implement it (kqueue, epoll).</p>
<p>It seems that your change (and others like it) makes Ruby even more dependent on kernel-level blocking IO operations always being safely interruptible, and depending on those interruptions to only occur at the exact boundaries defined by the GVL. A future concurrent-threaded Ruby (or other impls that may become concurrent-threaded) may want to consider this, no? And are there any cross-platform concerns from eliminating select in these cases?</p>
<p>I also wonder if there's a race condition here; is it not possible that the interrupt of a thread would fire immediately after the GVL has been released but before the blocking IO operation has fired? Perhaps I'm birdwalking too deep into the vagaries of MRI's IO logic.<br>
=end</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=162662011-04-02T07:23:20Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>=begin<br>
Charles Nutter <a href="mailto:headius@headius.com" class="email">headius@headius.com</a> wrote:</p>
<blockquote>
<p>I wonder, though, if depending on this behavior is leading Ruby more<br>
and more down the GVL path. The designers of the JVM's core IO<br>
libraries, for example, were unable to reconcile concurrent native<br>
threads with interruptible IO, due to the impossibility of knowing<br>
what state all IO-related data structures are in when the thread is<br>
interrupted.</p>
</blockquote>
<p>I don't think so, even if threads are interrupted they're resumed after<br>
the signal handler is done (or the process is dying anyways and we don't<br>
care). If the interrupt is to raise an exception then that could get<br>
messy[1], but for the general case of signal handlers it's not an issue.</p>
<blockquote>
<p>As a result, IO channels performing blocking operations<br>
are explicitly closed when the thread they block is interrupted.</p>
</blockquote>
<p>That is terrible. I'd never touch a platform that does that.</p>
<blockquote>
<p>It seems that your change (and others like it) makes Ruby even more<br>
dependent on kernel-level blocking IO operations always being safely<br>
interruptible, and depending on those interruptions to only occur at<br>
the exact boundaries defined by the GVL. A future concurrent-threaded<br>
Ruby (or other impls that may become concurrent-threaded) may want to<br>
consider this, no? And are there any cross-platform concerns from<br>
eliminating select in these cases?</p>
</blockquote>
<p>If there are cross-platform concerns, the functions that wrap select()<br>
should be made no-op on platforms where select() is not needed (on<br>
all POSIX-like ones, I expect) and not interfere with platforms where<br>
they're not needed.</p>
<p>Regardless, there'll always be a set of IO operations that can never be<br>
interrupted. That doesn't bother me at all since the rest of the VM<br>
still runs. I'd rather just not use select()/poll() at all for<br>
"blocking" I/O calls.</p>
<blockquote>
<p>I also wonder if there's a race condition here; is it not possible<br>
that the interrupt of a thread would fire immediately after the GVL<br>
has been released but before the blocking IO operation has fired?<br>
Perhaps I'm birdwalking too deep into the vagaries of MRI's IO logic.</p>
</blockquote>
<p>So a signal handler might fire and the syscall would just continue and<br>
not fail with EINTR. No big deal, it'll just finish the syscall before<br>
checking for interrupts.</p>
<p>The real race condition is relying on select()/poll() at all for<br>
readability. select()/poll() returning success <em>never</em> guarantees an<br>
operation won't block due to spurious wakeups and shared IO across<br>
multiple threads/processes.</p>
<p>[1] - which is why rb_ensure() is used in some places, such as using<br>
with select() for rb_fd_init()/rb_fd_term()</p>
<p>--<br>
Eric Wong<br>
=end</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=162682011-04-03T07:23:06Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
On Fri, Apr 1, 2011 at 4:57 PM, Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p>
<blockquote>
<p>Charles Nutter <a href="mailto:headius@headius.com" class="email">headius@headius.com</a> wrote:</p>
<blockquote>
<p>I wonder, though, if depending on this behavior is leading Ruby more<br>
and more down the GVL path. The designers of the JVM's core IO<br>
libraries, for example, were unable to reconcile concurrent native<br>
threads with interruptible IO, due to the impossibility of knowing<br>
what state all IO-related data structures are in when the thread is<br>
interrupted.</p>
</blockquote>
<p>I don't think so, even if threads are interrupted they're resumed after<br>
the signal handler is done (or the process is dying anyways and we don't<br>
care). If the interrupt is to raise an exception then that could get<br>
messy[1], but for the general case of signal handlers it's not an issue.</p>
</blockquote>
<p>I'm speaking specifically of Thread#raise and Thread#kill, which if<br>
used to interrupt a thread could potentially leave the IO channel in<br>
an unknown state (due to interrupting during a system call). On the<br>
JVM, all process-level signals are handled by a separate thread, so<br>
they are never run on user threads and that's not a concern for us.<br>
JRuby has real concurrent threads, so regardless of what blocking<br>
calls we make other threads will continue to run (i.e. we have no need<br>
for BLOCKING_REGION-tyle GVL logic). So ultimately it's only being<br>
able to kill or raise in an arbitrary thread that led us to make JRuby<br>
IO logic use selection to get around the effects of interrupting<br>
blocking IO calls I mentioned below.</p>
<p>Long story short, how does MRI guarantee that the underlying IO is in<br>
a reliable state when the thread accessing it can be interrupted<br>
permanently? It seems like doing most blocking at a consistent point<br>
(like a select call) is safer.</p>
<p>And I am mostly just trying to understand how it's consistently safe<br>
to interrupt a system-level IO call.</p>
<blockquote>
<blockquote>
<p>As a result, IO channels performing blocking operations<br>
are explicitly closed when the thread they block is interrupted.</p>
</blockquote>
<p>That is terrible. I'd never touch a platform that does that.</p>
</blockquote>
<p>Well, I tend not to touch platforms that expose or depend on specific<br>
platform details, like MRI does in <em>many</em> places (and now more places<br>
with your patch, I think). I like my code to work the same on all<br>
platforms.</p>
<p>That said, I admit it's inconvenient, but I understand the reasoning.<br>
You have to understand the JVM is trying to smooth over the<br>
platform-specific details of IO across lots of platforms, many of them<br>
not POSIX. If you can't guarantee to user code what the state of an IO<br>
channel will be when interrupting system-level code, it's a pretty<br>
clean option to say "don't do that, or we'll close the stream" and<br>
point users toward a safely interruptible option like select.</p>
<p>We've managed to work with that situation and mostly emulate MRI's IO<br>
behavior, so in practice it's more a nuisance than anything else.</p>
<blockquote>
<p>If there are cross-platform concerns, the functions that wrap select()<br>
should be made no-op on platforms where select() is not needed (on<br>
all POSIX-like ones, I expect) and not interfere with platforms where<br>
they're not needed.</p>
<p>Regardless, there'll always be a set of IO operations that can never be<br>
interrupted. That doesn't bother me at all since the rest of the VM<br>
still runs. I'd rather just not use select()/poll() at all for<br>
"blocking" I/O calls.</p>
</blockquote>
<p>That seems good on the surface, but it's depending on those blocking<br>
operations having consistent state after being interrupted across<br>
platforms. That seems like it would be easier to guarantee at a<br>
"select" level, but I admit I'm trying to understand if that's true.<br>
If you can't guarantee that the underlying IO channels are in a<br>
consistent state (ideally the <em>same</em> state regardless of platform)<br>
then writing to IO becomes a bunch of platform-specific checks in user<br>
code just like you'd have to write in C. The structure of Ruby's APIs<br>
has always been to provide a reasonably consistent view of<br>
system-level APIs so you don't have to do that.</p>
<blockquote>
<blockquote>
<p>I also wonder if there's a race condition here; is it not possible<br>
that the interrupt of a thread would fire immediately after the GVL<br>
has been released but before the blocking IO operation has fired?<br>
Perhaps I'm birdwalking too deep into the vagaries of MRI's IO logic.</p>
</blockquote>
<p>So a signal handler might fire and the syscall would just continue and<br>
not fail with EINTR. No big deal, it'll just finish the syscall before<br>
checking for interrupts.</p>
</blockquote>
<p>Except that you've now fired your Thread#kill or Thread#raise and the<br>
thread is never going to see it. If the contract of kill and raise is<br>
that "we'll try to kill or raise in the target thread, but no<br>
guarantees if it will do anything at all" I'm fine with that, but that<br>
hasn't been the expectation of Ruby users up to now. I'm not sure if<br>
this is actually a problem or not...MRI's cross-thread event behavior<br>
is rather involved.</p>
<blockquote>
<p>The real race condition is relying on select()/poll() at all for<br>
readability. select()/poll() returning success <em>never</em> guarantees an<br>
operation won't block due to spurious wakeups and shared IO across<br>
multiple threads/processes.</p>
</blockquote>
<p>That's certainly true, but any code using select would not just<br>
blindly proceed to a blocking call after wakeup...it would check that<br>
the IO channel is actually ready, and if not go into select again. I<br>
don't see how that makes the consistency and reliability of blocking<br>
on selection less attractive than interrupting arbitrary kernel-level<br>
calls.</p>
<ul>
<li>Charlie<br>
=end</li>
</ul> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=162782011-04-05T03:23:06Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>=begin<br>
Charles Oliver Nutter <a href="mailto:headius@headius.com" class="email">headius@headius.com</a> wrote:</p>
<blockquote>
<p>On Fri, Apr 1, 2011 at 4:57 PM, Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a> wrote:</p>
<blockquote>
<p>Charles Nutter <a href="mailto:headius@headius.com" class="email">headius@headius.com</a> wrote:</p>
<blockquote>
<p>I wonder, though, if depending on this behavior is leading Ruby more<br>
and more down the GVL path. The designers of the JVM's core IO<br>
libraries, for example, were unable to reconcile concurrent native<br>
threads with interruptible IO, due to the impossibility of knowing<br>
what state all IO-related data structures are in when the thread is<br>
interrupted.</p>
</blockquote>
<p>I don't think so, even if threads are interrupted they're resumed after<br>
the signal handler is done (or the process is dying anyways and we don't<br>
care). If the interrupt is to raise an exception then that could get<br>
messy[1], but for the general case of signal handlers it's not an issue.</p>
</blockquote>
<p>I'm speaking specifically of Thread#raise and Thread#kill, which if<br>
used to interrupt a thread could potentially leave the IO channel in<br>
an unknown state (due to interrupting during a system call).</p>
</blockquote>
> Long story short, how does MRI guarantee that the underlying IO is in
> a reliable state when the thread accessing it can be interrupted
> permanently? It seems like doing most blocking at a consistent point
> (like a select call) is safer.
<p>MRI should (already appears to) define rb_thread_blocking_region() as a<br>
cancellation point for Thread#raise and Thread#kill so any C code should<br>
tidy things up before entering/leaving a blocking region.</p>
<blockquote>
<p>And I am mostly just trying to understand how it's consistently safe<br>
to interrupt a system-level IO call.</p>
</blockquote>
<p>The actual syscall are usually very trivial and has few (if any)<br>
user-visible internal structures to worry about unless memory was<br>
malloc()-ed for it (in the case of select() + rb_fd_init()).</p>
<p>The kernel is expected to handle all the internal structures for<br>
interruptibility (it only exposes an opaque integer descriptor to<br>
userspace).</p>
<blockquote>
<blockquote>
<blockquote>
<p>As a result, IO channels performing blocking operations<br>
are explicitly closed when the thread they block is interrupted.</p>
</blockquote>
<p>That is terrible. I'd never touch a platform that does that.</p>
</blockquote>
<p>Well, I tend not to touch platforms that expose or depend on specific<br>
platform details, like MRI does in <em>many</em> places (and now more places<br>
with your patch, I think). I like my code to work the same on all<br>
platforms.</p>
</blockquote>
<p>We'll have to agree to differ here :)</p>
<p>I choose to work on my platform (Linux) because I see more benefits to it<br>
than alternatives and would like to take advantage of strengths of it.</p>
<blockquote>
<blockquote>
<p>If there are cross-platform concerns, the functions that wrap select()<br>
should be made no-op on platforms where select() is not needed (on<br>
all POSIX-like ones, I expect) and not interfere with platforms where<br>
they're not needed.</p>
<p>Regardless, there'll always be a set of IO operations that can never be<br>
interrupted. That doesn't bother me at all since the rest of the VM<br>
still runs. I'd rather just not use select()/poll() at all for<br>
"blocking" I/O calls.</p>
</blockquote>
<p>That seems good on the surface, but it's depending on those blocking<br>
operations having consistent state after being interrupted across<br>
platforms. That seems like it would be easier to guarantee at a<br>
"select" level, but I admit I'm trying to understand if that's true.<br>
If you can't guarantee that the underlying IO channels are in a<br>
consistent state (ideally the <em>same</em> state regardless of platform)<br>
then writing to IO becomes a bunch of platform-specific checks in user<br>
code just like you'd have to write in C. The structure of Ruby's APIs<br>
has always been to provide a reasonably consistent view of<br>
system-level APIs so you don't have to do that.</p>
</blockquote>
<p>Upon further inspection, I see MRI uses select() only in 100ms<br>
increments while checking for interrupts on win32. That may be because<br>
win32 can't interrupt syscalls like select(), but all other platforms<br>
MRI supports can...</p>
<p>I shall update my patch to only select() before I/O on win32 is somebody<br>
can confirm it is needed. On POSIX, select() never wakes up unless it<br>
receives a signal (or a descriptor is ready).</p>
<blockquote>
<blockquote>
<blockquote>
<p>I also wonder if there's a race condition here; is it not possible<br>
that the interrupt of a thread would fire immediately after the GVL<br>
has been released but before the blocking IO operation has fired?<br>
Perhaps I'm birdwalking too deep into the vagaries of MRI's IO logic.</p>
</blockquote>
<p>So a signal handler might fire and the syscall would just continue and<br>
not fail with EINTR. No big deal, it'll just finish the syscall before<br>
checking for interrupts.</p>
</blockquote>
<p>Except that you've now fired your Thread#kill or Thread#raise and the<br>
thread is never going to see it.</p>
</blockquote>
<p>I don't think "never" is correct, but seeing it too late seems to be<br>
a current problem...</p>
<blockquote>
<p>If the contract of kill and raise is<br>
that "we'll try to kill or raise in the target thread, but no<br>
guarantees if it will do anything at all" I'm fine with that, but that<br>
hasn't been the expectation of Ruby users up to now. I'm not sure if<br>
this is actually a problem or not...MRI's cross-thread event behavior<br>
is rather involved.</p>
</blockquote>
<p>Refiring signal pthread_kill() is probably needed if the syscall blocks<br>
a long time:</p>
<a name="blocking_thread-interrupting_thread"></a>
<h2 >blocking_thread interrupting_thread<a href="#blocking_thread-interrupting_thread" class="wiki-anchor">¶</a></h2>
<p>check ints => nothing<br>
release gvl<br>
Thread#raise<br>
set interrupt flag<br>
pthread_kill()<br>
long syscall<br>
(long time passes...)<br>
check ints => finally sees Thread#raise</p>
<p>In this case for delivering Thread#raise in timely fashion, the<br>
interrupting thread will need to set a timer to refire pthread_kill() in<br>
a loop if it detects blocking_thread hasn't reacted to the signal, yet.<br>
Multiple EINTRs from pthread_kill() wouldn't be any more/less harmful<br>
than one EINTR.</p>
<p>I realize doing a short 100ms poll()/select() like win32 does is<br>
possible for timelier delivery of Thread#raise/Thread#kill, but I'd<br>
rather avoid those expensive syscalls for general case since<br>
Thread#raise/Thread#kill is not common.</p>
<blockquote>
<blockquote>
<p>The real race condition is relying on select()/poll() at all for<br>
readability. select()/poll() returning success <em>never</em> guarantees an<br>
operation won't block due to spurious wakeups and shared IO across<br>
multiple threads/processes.</p>
</blockquote>
<p>That's certainly true, but any code using select would not just<br>
blindly proceed to a blocking call after wakeup...it would check that<br>
the IO channel is actually ready, and if not go into select again. I<br>
don't see how that makes the consistency and reliability of blocking<br>
on selection less attractive than interrupting arbitrary kernel-level<br>
calls.</p>
</blockquote>
<p>Blocking syscalls can be better for some cases (e.g. accept() under<br>
Linux) since the kernel can implement behavior to wake up exactly one<br>
waiter on a ready client, whereas with select()/poll(), all the waiters<br>
get woken. I try to take advantage of that (avoids doubling up on<br>
syscalls made) when possible.</p>
<p>--<br>
Eric Wong<br>
=end</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=176692011-06-10T22:53:07Zko1 (Koichi Sasada)
<ul></ul><p>Hi,</p>
<p>Any action on this proposal?</p>
<p>This thread is too long and difficult to understand....</p>
<p>(2011/03/30 3:22), Eric Wong wrote:</p>
<blockquote>
<p>Issue <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/O (Closed)" href="https://redmine.ruby-lang.org/issues/4538">#4538</a> has been reported by Eric Wong.</p>
<hr>
<p>Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/O (Closed)" href="https://redmine.ruby-lang.org/issues/4538">#4538</a>: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/O<br>
<a href="http://redmine.ruby-lang.org/issues/4538" class="external">http://redmine.ruby-lang.org/issues/4538</a></p>
<p>Author: Eric Wong<br>
Status: Open<br>
Priority: Low<br>
Assignee:<br>
Category: core<br>
Target version: 1.9.x</p>
<p>Please look at <a href="http://redmine.ruby-lang.org/issues/4535" class="external">http://redmine.ruby-lang.org/issues/4535</a> before<br>
this one. That one actually fixes a bug I noticed while working<br>
on this patch.</p>
<p>Ruby 1.9 no longer depends on multiplexed non-blocking I/O<br>
to do its threading and defaults to blocking file descriptors.</p>
<p>As a result, there is no need to check the fd for read/writability when<br>
there is an error check for rb_io_wait_(read|writ)able after the<br>
blocking function.</p>
<p>I also believe the code in io_binwrite() to:<br>
avoid context switch between "a" and "\n" in STDERR.puts "a".<br>
<a href="https://blade.ruby-lang.org/ruby-dev/25080">[ruby-dev:25080]</a><br>
...has always been broken under 1.9 with native threads.</p>
<p>Nothing new is broken with test-all and test-rubyspec</p>
</blockquote>
<p>--<br>
// SASADA Koichi at atdot dot net</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=176782011-06-10T23:05:19Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Assigned</i></li><li><strong>Assignee</strong> set to <i>kosaki (Motohiro KOSAKI)</i></li></ul><p>Assigned.</p> Ruby master - Feature #4538: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/Ohttps://redmine.ruby-lang.org/issues/4538?journal_id=292312012-09-10T02:29:23Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul><li><strong>Status</strong> changed from <i>Assigned</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>This issue was solved with changeset r36944.<br>
Eric, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<hr>
<ul>
<li>ext/socket/basicsocket.c (rsock_bsock_send):<br>
avoid unnecessary select() calls before doing I/O<br>
Patch by Eric Wong. [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: [PATCH (cleanup)] avoid unnecessary select() calls before doing I/O (Closed)" href="https://redmine.ruby-lang.org/issues/4538">#4538</a>] <a href="/issues/4538">[ruby-core:35586]</a></li>
<li>ext/socket/init.c (rsock_s_recvfrom): ditto.</li>
<li>ext/socket/init.c (rsock_s_accept): ditto.</li>
<li>ext/socket/udpsocket.c (udp_send): ditto.</li>
<li>io.c (io_fflush): ditto.</li>
<li>io.c (io_binwrite): ditto.</li>
<li>io.c (rb_io_syswrite): ditto.</li>
</ul>