Project

General

Profile

Actions

Bug #7646

closed

String#each_lineでinvalid byte sequence

Added by yoshidam (Yoshida Masato) over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
Backport:
[ruby-dev:46827]

Description

=begin
String#each_lineでセパレータを指定したときにASCII以外の文字でinvalid byte sequenceが発生します。

$ ruby -ve '"\n\u0100".each_line("\n") {|l| p l }'
ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
"\n"
-e:1:in each_line': invalid byte sequence in UTF-8 (ArgumentError) from -e:1:in '

r38616あたりの変更で入ったバグのようです。

 
--- string.c.org 2012-12-27 21:57:07.000000000 +0900
+++ string.c 2013-01-02 23:36:47.000000000 +0900
@@ -6199,14 +6199,14 @@
if (c == newline &&
(rslen <= 1 ||
(pend - p >= rslen && memcmp(RSTRING_PTR(rs), p, rslen) == 0))) {

  •       p += (rslen ? rslen : n);
    
  •       line = rb_str_subseq(str, s - ptr, p - s);
    
  •       const char *pp = p + (rslen ? rslen : n);
    
  •       line = rb_str_subseq(str, s - ptr, pp - s);
          if (wantarray)
              rb_ary_push(ary, line);
          else
              rb_yield(line);
          str_mod_check(str, ptr, len);
    
  •       s = p;
    
  •       s = pp;
      }
      p += n;
    
    }

=end

Actions

Also available in: Atom PDF