Project

General

Profile

Actions

Bug #21530

open

Is IO#eof? supposed to always block and read?

Added by tenderlovemaking (Aaron Patterson) about 12 hours ago. Updated about 2 hours ago.

Status:
Open
Assignee:
-
Target version:
-
[ruby-core:122910]

Description

I'm not sure whether or not this is expected behavior, but it seems like eof? blocks when called on $stdin.

For example:

if (str = $stdin.gets)
  $stderr.puts "read #{str}"
end

if $stdin.eof? # this call waits for input
  $stderr.puts "stdin is eof"
end

I think this is kind of odd behavior because if you input a string but do not input a newline, then hit ^D twice, $stdin should be at EOF, but eof? will block and wait for input. If you hit ^D a third time, $stdin will be EOF, but if you input a different character it will not be EOF.

Compare this C program:

#include <stdio.h>
#include <stdlib.h>

#define BUF_SIZE 4096

int main(int argc, char *argv[]) {
    char buf[BUF_SIZE];
    if (fgets(buf, BUF_SIZE, stdin)) {
        fprintf(stderr, "read %s\n", buf);
    }

    if (feof(stdin)) { // Does not block
        fprintf(stderr, "stdin is EOF\n");
    }
}

If you hit ^D twice with this C program, feof will return true for stdin. I would have expected the Ruby program and the C program to behave similarly, but they don't. Is this expected? The documentation indeed says that eof? will read, but shouldn't the IO be at EOF after the second ^D?

Thank you.

Updated by nobu (Nobuyoshi Nakada) about 6 hours ago

It has been changed intentionally, AFAIR, to allow read from the tty twice.

Updated by mame (Yusuke Endoh) about 2 hours ago

The short answer is: Ruby handles EOF in the Pascal style, not the C style.

In C, the FILE structure has an EOF flag. When a read(2) syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof returns true.

On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.

This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:

  • Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect feof() checks.
  • Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).

What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:

$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]

FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)

https://gihyo.jp/book/2016/978-4-7741-7802-8

1.02 feof関数とIO#eof?メソッド ——過去にEOFに出会ったのか、それとも今現在EOFなのか

  • C言語とPascalにおけるファイルの終端
  • ユーザにとってわかりやすいファイルの終端
  • まとめ

1.04 EOFフラグの除去 ——モードで挙動が変化するのは良くない

  • stdioのEOFフラグ
  • RubyにおけるEOFフラグ
  • EOFフラグの再実装の試み
  • まとめ
Actions

Also available in: Atom PDF

Like0
Like0Like0