PTY or IO.select timing issue results in no EOF
I have observed that when running a shell through PTY the slave will sometimes fail to produce an EOF after an exit command. As a result polling via IO.select can timeout. A full example is attached. This is a simplified example illustrating the problematic loop:
# PTY.spawn ...
master.write "exit 8\n"
str = ''
raise "timeout waiting for slave EOF"
if slave.eof? break end str << slave.read(1)
After 'exit' is written to master, the loop normally reads all of slave into str. The select ensures the loop can timeout but under normal circumstances it will not (3 seconds is plenty of time to exit a shell). The bug is that it occasionally does timeout having never seen an EOF - meaning either the select is not detecting EOF on the slave or an EOF is not being written to the slave.
The bizarre thing is that I can confirm after the timeout that the pty process does exit with the correct status (8) regardless of whether the loop exits normally with an EOF or by timeout.
I'm not sure if this is an issue with the PTY implementation, an issue with the shell, or with the OS. I have observed the bug repeatedly using 1.9.2 on OS X 10.6.8, Ubuntu 11.04, and SLES 10, and with shells bash, ksh, csh, zsh (although mostly with bash). I suspect the PTY implementation plays some role because the bug does not appear to occur on 1.8.7 and 1.8.6. However the frequency of the bug varies so much across OS and shell, I know it could very well be an issue outside of ruby.
To reproduce, run the pty_fail.rb script for 10k (or more) iterations. On OS X it usually crops up within 10k. On Ubuntu 11.04 it is very, very rare, ~100k may be needed. Ex:
ruby pty_no_eof_example.rb 10000 /bin/bash
Updated by akr (Akira Tanaka) over 7 years ago
- Status changed from Assigned to Feedback
I tried to reproduce the problem on Debian GNU/Linux and FreeBSD. (I don't have Mac OS X.)
It is possible but very rare.
However the problem is occur more frequently if I run different heavy task on the host.
So, I guess the problem is just a "3 seconds is not enough".
It is possible that OS runs the child process very slowly if the host is very busy.
Is there an evidence that this problem is actually the problem of Ruby?