Project

General

Profile

Actions

Bug #21530

closed

Is IO#eof? supposed to always block and read?

Added by tenderlovemaking (Aaron Patterson) 21 days ago. Updated 20 days ago.

Status:
Rejected
Assignee:
-
Target version:
-
[ruby-core:122910]

Description

I'm not sure whether or not this is expected behavior, but it seems like eof? blocks when called on $stdin.

For example:

if (str = $stdin.gets)
  $stderr.puts "read #{str}"
end

if $stdin.eof? # this call waits for input
  $stderr.puts "stdin is eof"
end

I think this is kind of odd behavior because if you input a string but do not input a newline, then hit ^D twice, $stdin should be at EOF, but eof? will block and wait for input. If you hit ^D a third time, $stdin will be EOF, but if you input a different character it will not be EOF.

Compare this C program:

#include <stdio.h>
#include <stdlib.h>

#define BUF_SIZE 4096

int main(int argc, char *argv[]) {
    char buf[BUF_SIZE];
    if (fgets(buf, BUF_SIZE, stdin)) {
        fprintf(stderr, "read %s\n", buf);
    }

    if (feof(stdin)) { // Does not block
        fprintf(stderr, "stdin is EOF\n");
    }
}

If you hit ^D twice with this C program, feof will return true for stdin. I would have expected the Ruby program and the C program to behave similarly, but they don't. Is this expected? The documentation indeed says that eof? will read, but shouldn't the IO be at EOF after the second ^D?

Thank you.

Updated by nobu (Nobuyoshi Nakada) 21 days ago

It has been changed intentionally, AFAIR, to allow read from the tty twice.

Updated by mame (Yusuke Endoh) 21 days ago

The short answer is: Ruby handles EOF in the Pascal style, not the C style.

In C, the FILE structure has an EOF flag. When a read(2) syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof returns true.

On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.

This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:

  • Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect feof() checks.
  • Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).

What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:

$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]

FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)

https://gihyo.jp/book/2016/978-4-7741-7802-8

1.02 feof関数とIO#eof?メソッド ——過去にEOFに出会ったのか、それとも今現在EOFなのか

  • C言語とPascalにおけるファイルの終端
  • ユーザにとってわかりやすいファイルの終端
  • まとめ

1.04 EOFフラグの除去 ——モードで挙動が変化するのは良くない

  • stdioのEOFフラグ
  • RubyにおけるEOFフラグ
  • EOFフラグの再実装の試み
  • まとめ

Updated by tenderlovemaking (Aaron Patterson) 20 days ago

  • Status changed from Open to Rejected

mame (Yusuke Endoh) wrote in #note-2:

The short answer is: Ruby handles EOF in the Pascal style, not the C style.

In C, the FILE structure has an EOF flag. When a read(2) syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof returns true.

On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.

This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:

  • Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect feof() checks.
  • Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).

What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:

$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]

Excellent. It makes sense. Thank you for the explanation and background information.

FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)

https://gihyo.jp/book/2016/978-4-7741-7802-8

Great! I bought a copy and I'll read it! Thank you!

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0