Project

General

Profile

Bug #7646

String#each_lineでinvalid byte sequence

Added by yoshidam (Yoshida Masato) over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
Backport:
[ruby-dev:46827]

Description

=begin
String#each_lineでセパレータを指定したときにASCII以外の文字でinvalid byte sequenceが発生します。

$ ruby -ve '"\n\u0100".each_line("\n") {|l| p l }'
ruby 2.0.0dev (2013-01-02 trunk 38676) [i686-linux]
"\n"
-e:1:in each_line': invalid byte sequence in UTF-8 (ArgumentError)
from -e:1:in
'

r38616あたりの変更で入ったバグのようです。

 
--- string.c.org 2012-12-27 21:57:07.000000000 +0900
+++ string.c 2013-01-02 23:36:47.000000000 +0900
@@ -6199,14 +6199,14 @@
if (c == newline &&
(rslen <= 1 ||
(pend - p >= rslen && memcmp(RSTRING_PTR(rs), p, rslen) == 0))) {

  • p += (rslen ? rslen : n);
  • line = rb_str_subseq(str, s - ptr, p - s);
  • const char *pp = p + (rslen ? rslen : n);
  • line = rb_str_subseq(str, s - ptr, pp - s); if (wantarray) rb_ary_push(ary, line); else rb_yield(line); str_mod_check(str, ptr, len);
  • s = p;
  • s = pp; } p += n; }

=end

Also available in: Atom PDF