Project

General

Profile

Actions

Bug #4097

closed

Unexpected result of STDIN.read on Windows

Added by phasis68 (Heesob Park) over 13 years ago. Updated over 12 years ago.

Status:
Third Party's Issue
Target version:
ruby -v:
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
Backport:
[ruby-core:33460]

Description

=begin
On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
가나다라abcd
"\xB0\xA1\xB3\xAA\xB4\xD9\xB6\xF3ab\x00\x00\xB8t"
14

On the other hand, Ruby 1.8.6 works fine.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.8.6 (2010-02-04 patchlevel 398) [i386-mingw32]
가나다라abcd
"\260\241\263\252\264\331\266\363ab"
10
=end

Actions #1

Updated by usa (Usaku NAKAMURA) over 13 years ago

=begin
Hello,

In message "[ruby-core:33460] [Ruby 1.9-Bug#4097][Open] Unexpected result of STDIN.read on Windows"
on Nov.29,2010 18:26:13, wrote:

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

What version of Windows do you use?
I guess you use Korean version of 32bit XP, don't you?

Tarui-san tested many cases on Japanese version of 32bit XP,
and has found that this seems to be a bug of Windows itself...

Regards,

U.Nakamura

=end

Actions #2

Updated by luislavena (Luis Lavena) over 13 years ago

=begin
On Mon, Nov 29, 2010 at 8:44 AM, U.Nakamura wrote:

Hello,

In message "[ruby-core:33460] [Ruby 1.9-Bug#4097][Open] Unexpected result of STDIN.read on Windows"
   on Nov.29,2010 18:26:13, wrote:

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

What version of Windows do you use?
I guess you use Korean version of 32bit XP, don't you?

Tarui-san tested many cases on Japanese version of 32bit XP,
and has found that this seems to be a bug of Windows itself...

Perhaps is associated to the codepage used to input those characters?

I noticed that accented characters do not work for builtin cmd.exe
operations under chcp 437 or 850 for example. But works fine under
1252.

Unicode characters seems to work too under chcp 65001, but not with Ruby.

--
Luis Lavena
AREA 17

Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

=end

Actions #3

Updated by phasis68 (Heesob Park) over 13 years ago

=begin
Hi,

2010/11/29 U.Nakamura :

Hello,

In message "[ruby-core:33460] [Ruby 1.9-Bug#4097][Open] Unexpected result of STDIN.read on Windows"
   on Nov.29,2010 18:26:13, wrote:

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

What version of Windows do you use?
I guess you use Korean version of 32bit XP, don't you?

Yes, you are right.

Tarui-san tested many cases on Japanese version of 32bit XP,
and has found that this seems to be a bug of Windows itself...

I can see this bug on 32bit XP and 2003.
On Windows 7, this bug not appears.

Regards,
Park Heesob

=end

Actions #4

Updated by tarui (Masaya Tarui) over 13 years ago

=begin
Hello,

WindowsXP seems have a bug at read functions under multibyte console inputs.
I found a issue of coming from same bug of Windows. :-(

does anybody have a good workaround idea ?

ruby -ve 'a=STDIN.read(6);p [a,a.length];a=STDIN.read(2);p [a,a.length];'
ruby 1.9.3dev (2010-11-30 trunk 29978) [i386-mswin32_100]
あいうえおaiueo
["\x82\xA0\x82\xA2\x82\xA4", 6]
["iu", 2]

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
가나다라abcd
"\xB0\xA1\xB3\xAA\xB4\xD9\xB6\xF3ab\x00\x00\xB8t"
14

Regards,
Masaya TARUI

=end

Actions #5

Updated by phasis68 (Heesob Park) over 13 years ago

=begin
Hi,

2010/11/30 Masaya TARUI :

Hello,

WindowsXP seems have a bug at read functions under multibyte console inputs.
I found a issue of coming from same bug of Windows. :-(

does anybody have a good workaround idea ?

ruby -ve 'a=STDIN.read(6);p [a,a.length];a=STDIN.read(2);p [a,a.length];'
ruby 1.9.3dev (2010-11-30 trunk 29978) [i386-mswin32_100]
あいうえおaiueo
["\x82\xA0\x82\xA2\x82\xA4", 6]
["iu", 2]

On Ruby 1.9.x, in case of non-ASCII input, STDIN.read(n) returns some garbage attached string.

C:\work>ruby -ve 'a=STDIN.read(10);p a;p a.length'
ruby 1.9.3dev (2010-11-28 trunk 29965) [i386-mswin32_90]
가나다라abcd
"\xB0\xA1\xB3\xAA\xB4\xD9\xB6\xF3ab\x00\x00\xB8t"
14

I found ReadFile on console reads data per charachacter not byte.

Here is a workaround patch.

--- win32.c 2010-11-30 12:02:33.000000000 +0900
+++ win32.c.new 2010-11-30 12:01:46.000000000 +0900
@@ -5091,6 +5091,34 @@
pol = &ol;
}

  • if (is_console(_osfhnd(fd)) && len!=16384) {
  •   int len2=0;
    
  •   while(len2<len) {
    
  •           if (!ReadFile((HANDLE)_osfhnd(fd), buf, 1, &read, pol)) {
    
  •                   err = GetLastError();
    
  •                   if (err != ERROR_IO_PENDING) {
    
  •                       if (pol) CloseHandle(ol.hEvent);
    
  •                       if (err == ERROR_ACCESS_DENIED)
    
  •                           errno = EBADF;
    
  •                       else if (err == ERROR_BROKEN_PIPE || err
    

== ERROR_HANDLE_EOF) {
+
MTHREAD_ONLY(LeaveCriticalSection(&_pioinfo(fd)->lock));

  •                           return 0;
    
  •                       }
    
  •                       else
    
  •                           errno = map_errno(err);
    

MTHREAD_ONLY(LeaveCriticalSection(&_pioinfo(fd)->lock));

  •                       return -1;
    
  •                   }
    
  •           }
    
  •           len2 += read;
    
  •           buf = (char *)buf + read;
    
  •   }
    
  •   ret += len;
    
  •   if (size > 0)
    
  •       goto retry;
    
  • } else {
    if (!ReadFile((HANDLE)_osfhnd(fd), buf, len, &read, pol)) {
    err = GetLastError();
    if (err != ERROR_IO_PENDING) {
    @@ -5154,6 +5182,7 @@
    if (size > 0)
    goto retry;
    }

  • }

    MTHREAD_ONLY(LeaveCriticalSection(&_pioinfo(fd)->lock));

Regards,
Park Heesob

=end

Actions #6

Updated by usa (Usaku NAKAMURA) over 13 years ago

  • Status changed from Open to Assigned
  • Assignee set to tarui (Masaya Tarui)

=begin

=end

Updated by nahi (Hiroshi Nakamura) almost 13 years ago

  • Target version changed from 2.0.0 to 1.9.3

Updated by kosaki (Motohiro KOSAKI) almost 13 years ago

Tarui-san, ping?

Updated by tarui (Masaya Tarui) over 12 years ago

  • Status changed from Assigned to Third Party's Issue

Sorry for a delayed response.

Now, STDIN.read(n) under multibyte console inputs might return n+1 bytes String.(by r29980 and r30280)
Multibyte character is never split in read of MS runtime.

And, it is difficult to do STDIN.ungetc last byte because of lapping C-level read function.

I think that

  1. it's windows bug,
  2. we don't have an api base workaround ,
    and
  3. we can apply a workaround to application.

So, I change status to 3rd party's issue.
However, the patch is always a welcome.

Thanks,
Masaya TARUI

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0