https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112010-07-06T12:39:42ZRuby Issue Tracking SystemRuby master - Bug #3540: IO.copy_stream fails to detect client disconnect w/sendfilehttps://redmine.ruby-lang.org/issues/3540?journal_id=121692010-07-06T12:39:42Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>=begin<br>
2010/7/6 Eric Wong <a href="mailto:redmine@ruby-lang.org" class="email">redmine@ruby-lang.org</a>:</p>
<blockquote>
<p>sendfile() may return with a short write upon a client disconnect. Instead of<br>
retrying and getting an error, Ruby tries to force a select() on the descriptor<br>
which fails to detect the disconnect. This causes IO.copy_stream to hang,<br>
(possibly until TCP keepalives kick in). IO.copy_stream should raise<br>
immediately.</p>
</blockquote>
<p>Thank you for the reproducible script and fix.</p>
<p>I'll commit your fix.</p>
<p>However I think the Linux select behavior which doesn't notify writability on<br>
disconnected TCP socket is suspicious.</p>
<p>linux% ruby -rsocket -e '<br>
serv = TCPServer.open("127.0.0.1", 8888)<br>
s1 = TCPSocket.open("127.0.0.1", 8888)<br>
s2 = serv.accept<br>
s2.close<br>
s1.write "a" rescue p $!<br>
s1.write "a" rescue p $!<br>
p IO.select(nil, [s1], nil, 0)<br>
'<br>
#<Errno::EPIPE: Broken pipe><br>
nil</p>
<p>FreeBSD and Solaris notify writability.</p>
<p>freebsd% ruby -rsocket -e '<br>
serv = TCPServer.open("127.0.0.1", 8888)<br>
s1 = TCPSocket.open("127.0.0.1", 8888)<br>
s2 = serv.accept<br>
s2.close<br>
s1.write "a" rescue p $!<br>
s1.write "a" rescue p $!<br>
p IO.select(nil, [s1], nil, 0)<br>
'<br>
#<Errno::EPIPE: Broken pipe><br>
[[], [#<a href="TCPSocket:0x283263d8" class="external">TCPSocket:0x283263d8</a>], []]</p>
<p>solaris% ruby -rsocket -e '<br>
serv = TCPServer.open("127.0.0.1", 8888)<br>
s1 = TCPSocket.open("127.0.0.1", 8888)<br>
s2 = serv.accept<br>
s2.close<br>
s1.write "a" rescue p $!<br>
s1.write "a" rescue p $!<br>
p IO.select(nil, [s1], nil, 0)<br>
'<br>
#<Errno::EPIPE: Broken pipe><br>
[[], [#<a href="TCPSocket:0x80e6e6c" class="external">TCPSocket:0x80e6e6c</a>], []]</p>
<p>I think select should notify writability when write would not block.<br>
Cleary write doesn't block on disconnected socket.</p>
<p>Linux also notify writability for UNIX domain socket pair.</p>
<p>linux% ruby -rsocket -e '<br>
s1, s2 = UNIXSocket.pair<br>
s2.close<br>
s1.write "a" rescue p $!<br>
p IO.select(nil, [s1], nil, 0)<br>
'<br>
#<Errno::EPIPE: Broken pipe><br>
[[], [#<UNIXSocket:fd 3>], []]</p>
<a name="I-tested-Linux-2626"></a>
<h2 >I tested Linux 2.6.26.<a href="#I-tested-Linux-2626" class="wiki-anchor">¶</a></h2>
<p>Tanaka Akira</p>
<p>=end</p> Ruby master - Bug #3540: IO.copy_stream fails to detect client disconnect w/sendfilehttps://redmine.ruby-lang.org/issues/3540?journal_id=121712010-07-06T14:34:05Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>=begin<br>
Tanaka Akira <a href="mailto:akr@fsij.org" class="email">akr@fsij.org</a> wrote:</p>
<blockquote>
<p>2010/7/6 Eric Wong <a href="mailto:redmine@ruby-lang.org" class="email">redmine@ruby-lang.org</a>:</p>
<blockquote>
<p>sendfile() may return with a short write upon a client disconnect. Instead of<br>
retrying and getting an error, Ruby tries to force a select() on the descriptor<br>
which fails to detect the disconnect. This causes IO.copy_stream to hang,<br>
(possibly until TCP keepalives kick in). IO.copy_stream should raise<br>
immediately.</p>
</blockquote>
<p>Thank you for the reproducible script and fix.</p>
<p>I'll commit your fix.</p>
</blockquote>
<p>Thank you for looking into this.</p>
<blockquote>
<p>However I think the Linux select behavior which doesn't notify writability on<br>
disconnected TCP socket is suspicious.</p>
</blockquote>
<blockquote>
<p>FreeBSD and Solaris notify writability.</p>
</blockquote>
<blockquote>
<p>I think select should notify writability when write would not block.<br>
Cleary write doesn't block on disconnected socket.</p>
<p>Linux also notify writability for UNIX domain socket pair.</p>
</blockquote>
<p>UNIX domain sockets are easy to do notification for since they're always<br>
on the same host. TCP might be harder to detect (and thus the Linux<br>
folks choose not to bother at all) because the client is on a different<br>
machine and it might lose a physical connection.</p>
<p>How does FreeBSD or Solaris behave if a client is on a different machine<br>
and has the network cable pulled out? In the case of physically<br>
disconnected network cable, the client TCP stack has no way to notify<br>
the server of a disconnect. "kill -9" or even normal OS shutdown would<br>
give the TCP stack a chance to properly shutdown the connection.</p>
<p>There are a few more instances of "errno = EAGAIN" assignments in io.c<br>
that look suspicious to me. My proposed fixes are below, but I'm<br>
having trouble reproducing the badness I was seeing with IO.copy_stream<br>
in these code paths:</p>
<p>diff --git a/io.c b/io.c<br>
index 5129a14..108af7e 100644<br>
--- a/io.c<br>
+++ b/io.c<br>
@@ -649,7 +649,7 @@ io_fflush(rb_io_t *fptr)<br>
if (0 <= r) {<br>
fptr->wbuf_off += (int)r;<br>
fptr->wbuf_len -= (int)r;</p>
<ul>
<li>
<pre><code> errno = EAGAIN;
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> goto retry;
</code></pre>
}<br>
if (rb_io_wait_writable(fptr->fd)) {<br>
rb_io_check_closed(fptr);<br>
@@ -877,7 +877,8 @@ io_binwrite(VALUE str, rb_io_t *fptr, int nosync)<br>
if (0 <= r) {<br>
offset += r;<br>
n -= r;</li>
</ul>
<ul>
<li>
<pre><code> errno = EAGAIN;
</code></pre>
</li>
</ul>
<ul>
<li>
<pre><code> if (offset < RSTRING_LEN(str))
</code></pre>
</li>
<li>
<pre><code> goto retry;
}
if (rb_io_wait_writable(fptr->fd)) {
rb_io_check_closed(fptr);
</code></pre>
</li>
</ul>
<p>--<br>
Eric Wong</p>
<p>=end</p> Ruby master - Bug #3540: IO.copy_stream fails to detect client disconnect w/sendfilehttps://redmine.ruby-lang.org/issues/3540?journal_id=121722010-07-06T15:32:37Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>=begin<br>
2010/7/6 Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a>:</p>
<blockquote>
<p>UNIX domain sockets are easy to do notification for since they're always<br>
on the same host. TCP might be harder to detect (and thus the Linux<br>
folks choose not to bother at all) because the client is on a different<br>
machine and it might lose a physical connection.</p>
</blockquote>
<p>If the kernel cannot detect disconnect, how the kernel causes EPIPE?</p>
<blockquote>
<p>How does FreeBSD or Solaris behave if a client is on a different machine<br>
and has the network cable pulled out? In the case of physically<br>
disconnected network cable, the client TCP stack has no way to notify<br>
the server of a disconnect. "kill -9" or even normal OS shutdown would<br>
give the TCP stack a chance to properly shutdown the connection.</p>
</blockquote>
<p>I don't say about such physical disconnection.</p>
<p>I described about the situation that the kernel knows the connection is<br>
disconnected.</p>
<p>The connection is disconnected by RST packet.<br>
The RST packet is generated by a normal packet is sent to closed port.</p>
<p>% ruby -rsocket -e '<br>
def netstat<br>
s = <code>netstat -n</code><br>
s.each_line {|line| puts line if /State\s*$|127.0.0.1:8888/ =~ line }<br>
puts<br>
end<br>
serv = TCPServer.open("127.0.0.1", 8888)<br>
s1 = TCPSocket.open("127.0.0.1", 8888)<br>
s2 = serv.accept<br>
netstat<br>
s2.close<br>
netstat<br>
s1.write "a" rescue p $!<br>
netstat<br>
s1.write "a" rescue p $!<br>
p IO.select(nil, [s1], nil, 0)<br>
'<br>
Proto Recv-Q Send-Q Local Address Foreign Address<br>
State<br>
tcp 0 0 127.0.0.1:8888 127.0.0.1:34516<br>
ESTABLISHED<br>
tcp 0 0 127.0.0.1:34516 127.0.0.1:8888<br>
ESTABLISHED</p>
<p>Proto Recv-Q Send-Q Local Address Foreign Address<br>
State<br>
tcp 0 0 127.0.0.1:8888 127.0.0.1:34516<br>
FIN_WAIT2<br>
tcp 1 0 127.0.0.1:34516 127.0.0.1:8888<br>
CLOSE_WAIT</p>
<p>Proto Recv-Q Send-Q Local Address Foreign Address<br>
State</p>
<p>#<Errno::EPIPE: Broken pipe><br>
nil</p>
<p>When first netstat call, the TCP states of<br>
s1 (the local address is 127.0.0.1:8888) and<br>
s2 (the local address is 127.0.0.1:34516) are ESTABLISHED.</p>
<p>s2.close sends a FIN packet to s1.<br>
s1 receives it and send an ACK packet to s2.<br>
This changes s1 to FIN_WAIT_2 and s2 to CLOSE_WAIT.</p>
<p>The first s1.write "a" sends a normal data packet to s2.<br>
Since the write system call doesn't wait the result of the packet,<br>
the system call itself succeeds.<br>
But s2 is CLOSE_WAIT and no data acceptable.<br>
So s2 sends back a RST packet to s1 and change state of s2 to CLOSED.<br>
Then s1 receives the RST packet. It changes the state of s1 to CLOSED.</p>
<p>The second s1.write "a" fails with EPIPE.<br>
This is because the kernel knows s1 is CLOSED.</p>
<h2>Now the kernel knows write() for s1 doesn't block.<br>
(It causes an error immediately)<br>
So FreeBSD and Solaris notify it with select().<br>
But Linux doesn't.<br>
I think it is a problem of Linux.</h2>
<p>Tanaka Akira</p>
<p>=end</p> Ruby master - Bug #3540: IO.copy_stream fails to detect client disconnect w/sendfilehttps://redmine.ruby-lang.org/issues/3540?journal_id=121752010-07-06T17:12:19Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>=begin<br>
Tanaka Akira <a href="mailto:akr@fsij.org" class="email">akr@fsij.org</a> wrote:</p>
<blockquote>
<p>2010/7/6 Eric Wong <a href="mailto:normalperson@yhbt.net" class="email">normalperson@yhbt.net</a>:</p>
<blockquote>
<p>UNIX domain sockets are easy to do notification for since they're always<br>
on the same host. TCP might be harder to detect (and thus the Linux<br>
folks choose not to bother at all) because the client is on a different<br>
machine and it might lose a physical connection.</p>
</blockquote>
<p>If the kernel cannot detect disconnect, how the kernel causes EPIPE?</p>
<blockquote>
<p>How does FreeBSD or Solaris behave if a client is on a different machine<br>
and has the network cable pulled out? In the case of physically<br>
disconnected network cable, the client TCP stack has no way to notify<br>
the server of a disconnect. "kill -9" or even normal OS shutdown would<br>
give the TCP stack a chance to properly shutdown the connection.</p>
</blockquote>
<p>I don't say about such physical disconnection.</p>
<p>I described about the situation that the kernel knows the connection is<br>
disconnected.</p>
<p>The connection is disconnected by RST packet.<br>
The RST packet is generated by a normal packet is sent to closed port.</p>
<p>% ruby -rsocket -e '<br>
def netstat<br>
s = <code>netstat -n</code><br>
s.each_line {|line| puts line if /State\s*$|127.0.0.1:8888/ =~ line }<br>
puts<br>
end<br>
serv = TCPServer.open("127.0.0.1", 8888)<br>
s1 = TCPSocket.open("127.0.0.1", 8888)<br>
s2 = serv.accept<br>
netstat<br>
s2.close<br>
netstat<br>
s1.write "a" rescue p $!<br>
netstat<br>
s1.write "a" rescue p $!<br>
p IO.select(nil, [s1], nil, 0)<br>
'<br>
Proto Recv-Q Send-Q Local Address Foreign Address<br>
State<br>
tcp 0 0 127.0.0.1:8888 127.0.0.1:34516<br>
ESTABLISHED<br>
tcp 0 0 127.0.0.1:34516 127.0.0.1:8888<br>
ESTABLISHED</p>
<p>Proto Recv-Q Send-Q Local Address Foreign Address<br>
State<br>
tcp 0 0 127.0.0.1:8888 127.0.0.1:34516<br>
FIN_WAIT2<br>
tcp 1 0 127.0.0.1:34516 127.0.0.1:8888<br>
CLOSE_WAIT</p>
<p>Proto Recv-Q Send-Q Local Address Foreign Address<br>
State</p>
<p>#<Errno::EPIPE: Broken pipe><br>
nil</p>
<p>When first netstat call, the TCP states of<br>
s1 (the local address is 127.0.0.1:8888) and<br>
s2 (the local address is 127.0.0.1:34516) are ESTABLISHED.</p>
<p>s2.close sends a FIN packet to s1.<br>
s1 receives it and send an ACK packet to s2.<br>
This changes s1 to FIN_WAIT_2 and s2 to CLOSE_WAIT.</p>
<p>The first s1.write "a" sends a normal data packet to s2.<br>
Since the write system call doesn't wait the result of the packet,<br>
the system call itself succeeds.<br>
But s2 is CLOSE_WAIT and no data acceptable.<br>
So s2 sends back a RST packet to s1 and change state of s2 to CLOSED.<br>
Then s1 receives the RST packet. It changes the state of s1 to CLOSED.</p>
<p>The second s1.write "a" fails with EPIPE.<br>
This is because the kernel knows s1 is CLOSED.</p>
<p>Now the kernel knows write() for s1 doesn't block.<br>
(It causes an error immediately)<br>
So FreeBSD and Solaris notify it with select().<br>
But Linux doesn't.<br>
I think it is a problem of Linux.</p>
</blockquote>
<p>Ah ok, thanks for the clarification. I missed the second write failing<br>
with EPIPE entirely :x</p>
<p>I think my second patch to remove "errno = EAGAIN" assignments might be<br>
needed for some corner cases, too, because we need a second write() to<br>
detect EPIPE under Linux.</p>
<p>--<br>
Eric Wong</p>
<p>=end</p> Ruby master - Bug #3540: IO.copy_stream fails to detect client disconnect w/sendfilehttps://redmine.ruby-lang.org/issues/3540?journal_id=121882010-07-06T23:07:06Zakr (Akira Tanaka)akr@fsij.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>=begin<br>
This issue was solved with changeset r28557.<br>
Eric, thank you for reporting this issue.<br>
Your contribution to Ruby is greatly appreciated.<br>
May Ruby be with you.</p>
<p>=end</p> Ruby master - Bug #3540: IO.copy_stream fails to detect client disconnect w/sendfilehttps://redmine.ruby-lang.org/issues/3540?journal_id=122822010-07-12T08:49:01Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>=begin<br>
Akira Tanaka <a href="mailto:redmine@ruby-lang.org" class="email">redmine@ruby-lang.org</a> wrote:</p>
<blockquote>
<p>Issue <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: IO.copy_stream fails to detect client disconnect w/sendfile (Closed)" href="https://redmine.ruby-lang.org/issues/3540">#3540</a> has been updated by Akira Tanaka.</p>
<p>Status changed from Open to Closed<br>
% Done changed from 0 to 100</p>
<p>This issue was solved with changeset r28557.</p>
</blockquote>
<p>Can we get this backported to 1.9.2? I noticed it wasn't in rc2.<br>
Malicious clients can exploit this bug and DoS servers this way.</p>
<p>Thanks.</p>
<p>--<br>
Eric Wong</p>
<p>=end</p>