Bug #21530
openIs IO#eof? supposed to always block and read?
Description
I'm not sure whether or not this is expected behavior, but it seems like eof? blocks when called on $stdin.
For example:
if (str = $stdin.gets)
$stderr.puts "read #{str}"
end
if $stdin.eof? # this call waits for input
$stderr.puts "stdin is eof"
end
I think this is kind of odd behavior because if you input a string but do not input a newline, then hit ^D twice, $stdin
should be at EOF, but eof?
will block and wait for input. If you hit ^D a third time, $stdin will be EOF, but if you input a different character it will not be EOF.
Compare this C program:
#include <stdio.h>
#include <stdlib.h>
#define BUF_SIZE 4096
int main(int argc, char *argv[]) {
char buf[BUF_SIZE];
if (fgets(buf, BUF_SIZE, stdin)) {
fprintf(stderr, "read %s\n", buf);
}
if (feof(stdin)) { // Does not block
fprintf(stderr, "stdin is EOF\n");
}
}
If you hit ^D twice with this C program, feof
will return true for stdin
. I would have expected the Ruby program and the C program to behave similarly, but they don't. Is this expected? The documentation indeed says that eof?
will read, but shouldn't the IO be at EOF after the second ^D?
Thank you.
Updated by nobu (Nobuyoshi Nakada) about 6 hours ago
It has been changed intentionally, AFAIR, to allow read from the tty twice.
Updated by mame (Yusuke Endoh) about 2 hours ago
The short answer is: Ruby handles EOF in the Pascal style, not the C style.
In C, the FILE
structure has an EOF flag. When a read(2)
syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof
returns true.
On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets
is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.
This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:
- Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect
feof()
checks. - Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).
What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:
$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]
FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)
https://gihyo.jp/book/2016/978-4-7741-7802-8
1.02 feof関数とIO#eof?メソッド ——過去にEOFに出会ったのか、それとも今現在EOFなのか
- C言語とPascalにおけるファイルの終端
- ユーザにとってわかりやすいファイルの終端
- まとめ
1.04 EOFフラグの除去 ——モードで挙動が変化するのは良くない
- stdioのEOFフラグ
- RubyにおけるEOFフラグ
- EOFフラグの再実装の試み
- まとめ