Bug #21530: Is IO#eof? supposed to always block and read? - Ruby - Ruby Issue Tracking System

Actions

Copy link

Bug #21530

closed

Is IO#eof? supposed to always block and read?

Added by tenderlovemaking (Aaron Patterson) 2 months ago. Updated 2 months ago.

Status:

Rejected

Assignee:

Target version:

ruby -v:

Backport:

3.2: UNKNOWN, 3.3: UNKNOWN, 3.4: UNKNOWN

[ruby-core:122910]

Description

I'm not sure whether or not this is expected behavior, but it seems like eof? blocks when called on $stdin.

For example:

if (str = $stdin.gets)
  $stderr.puts "read #{str}"
end

if $stdin.eof? # this call waits for input
  $stderr.puts "stdin is eof"
end

I think this is kind of odd behavior because if you input a string but do not input a newline, then hit ^D twice, $stdin should be at EOF, but eof? will block and wait for input. If you hit ^D a third time, $stdin will be EOF, but if you input a different character it will not be EOF.

Compare this C program:

#include <stdio.h>
#include <stdlib.h>

#define BUF_SIZE 4096

int main(int argc, char *argv[]) {
    char buf[BUF_SIZE];
    if (fgets(buf, BUF_SIZE, stdin)) {
        fprintf(stderr, "read %s\n", buf);
    }

    if (feof(stdin)) { // Does not block
        fprintf(stderr, "stdin is EOF\n");
    }
}

If you hit ^D twice with this C program, feof will return true for stdin. I would have expected the Ruby program and the C program to behave similarly, but they don't. Is this expected? The documentation indeed says that eof? will read, but shouldn't the IO be at EOF after the second ^D?

Thank you.

Actions

Copy link

#1 [ruby-core:122912]

Updated by nobu (Nobuyoshi Nakada) 2 months ago

It has been changed intentionally, AFAIR, to allow read from the tty twice.

Actions

Copy link

#2 [ruby-core:122914]

Updated by mame (Yusuke Endoh) 2 months ago

The short answer is: Ruby handles EOF in the Pascal style, not the C style.

In C, the FILE structure has an EOF flag. When a read(2) syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof returns true.

On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.

This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:

Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect feof() checks.
Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).

What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:

$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]

FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)

https://gihyo.jp/book/2016/978-4-7741-7802-8

1.02 feof関数とIO#eof?メソッド ——過去にEOFに出会ったのか、それとも今現在EOFなのか

C言語とPascalにおけるファイルの終端

ユーザにとってわかりやすいファイルの終端

まとめ

1.04 EOFフラグの除去 ——モードで挙動が変化するのは良くない

stdioのEOFフラグ

RubyにおけるEOFフラグ

EOFフラグの再実装の試み

まとめ

Actions

Copy link

#3 [ruby-core:122916]

Updated by tenderlovemaking (Aaron Patterson) 2 months ago

Status changed from Open to Rejected

mame (Yusuke Endoh) wrote in #note-2:

The short answer is: Ruby handles EOF in the Pascal style, not the C style.

In C, the FILE structure has an EOF flag. When a read(2) syscall returns 0, the EOF flag in the FILE structure is set. In the example provided, if you forcefully interrupt the input for fgets by pressing ^D twice, the EOF flag is set, and a subsequent call to feof returns true.

On the other hand, in Pascal and Ruby, the IO object itself does not have an EOF flag. Therefore, even if IO#gets is forcefully interrupted with a double ^D, the IO object does not remember this state, and a subsequent call to IO#eof? will attempt to read again, thus blocking.

This is a trade-off, and neither approach is definitively "correct,", but Ruby's stateless approach has some advantages:

Simple and robust: There is no hidden state in an IO, which is good itself. It avoids common C bugs related to incorrect feof() checks.

Flexible: It works consistently for streams that can grow over time, like sockets or files being appended to (similar to tail -f).

What @nobu (Nobuyoshi Nakada) said is the second one. For example, you can continuously read from standard input or a growing file:
$ ruby -e 'p [1, $stdin.read]; p [2, $stdin.read]'
foo^D^D[1, "foo"]
bar^D^D[2, "bar"]

Excellent. It makes sense. Thank you for the explanation and background information.

FYI, a more detailed answer is written in the Japanese book "API design case study" by @akr (Akira Tanaka) who designed Ruby's IO. You may want to read it :-)

https://gihyo.jp/book/2016/978-4-7741-7802-8

Great! I bought a copy and I'll read it! Thank you!

Actions

Copy link

Also available in: Atom PDF

Like0

Like0Like0Like0

Project

General

Profile

Ruby

Tags

Custom queries

Bug #21530

Is IO#eof? supposed to always block and read?

Updated by nobu (Nobuyoshi Nakada) 2 months ago

Updated by mame (Yusuke Endoh) 2 months ago

Updated by tenderlovemaking (Aaron Patterson) 2 months ago