Project

General

Profile

Actions

Bug #8940

closed

printing UTF-32 crashs ruby

Added by Hanmac (Hans Mackowiak) over 10 years ago. Updated about 10 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.1.0dev (2013-09-23) [x86_64-darwin12.5.0]
[ruby-core:57318]

Description

using

p "äöü".encode("UTF-32")

does cause a SEGFAULT

-- C level backtrace information -------------------------------------------
0 libruby.2.1.0.dylib 0x00000001023f6679 rb_vm_bugreport + 137
1 libruby.2.1.0.dylib 0x00000001022bab1b report_bug + 283
2 libruby.2.1.0.dylib 0x00000001022ba9f4 rb_bug + 180
3 libruby.2.1.0.dylib 0x000000010237cc80 sigsegv + 144
4 libsystem_c.dylib 0x00007fff91d7d90a _sigtramp + 26
5 ??? 0x0000000000000000 0x0 + 0
6 libruby.2.1.0.dylib 0x00000001022b0045 rb_enc_precise_mbclen + 21
7 libruby.2.1.0.dylib 0x0000000102391cc8 rb_str_inspect + 968
8 libruby.2.1.0.dylib 0x00000001023f1e74 vm_call0_body + 2116
9 libruby.2.1.0.dylib 0x00000001023f1264 rb_call0 + 404
10 libruby.2.1.0.dylib 0x00000001023e7f15 rb_funcall + 261
11 libruby.2.1.0.dylib 0x0000000102312777 rb_inspect + 23
12 libruby.2.1.0.dylib 0x00000001022e663b rb_p + 11
13 libruby.2.1.0.dylib 0x00000001022f5b29 rb_f_p_internal + 57
14 libruby.2.1.0.dylib 0x00000001022c0b56 rb_ensure + 118
15 libruby.2.1.0.dylib 0x00000001022e9c9f rb_f_p + 31
16 libruby.2.1.0.dylib 0x00000001023f4baf vm_call_cfunc + 1007
17 libruby.2.1.0.dylib 0x00000001023f4528 vm_call_method + 840
18 libruby.2.1.0.dylib 0x00000001023deca7 vm_exec_core + 11591
19 libruby.2.1.0.dylib 0x00000001023eb4cd vm_exec + 109
20 libruby.2.1.0.dylib 0x00000001023ec2d8 rb_iseq_eval_main + 392
21 libruby.2.1.0.dylib 0x00000001022bfd69 ruby_exec_internal + 121
22 libruby.2.1.0.dylib 0x00000001022bfcae ruby_run_node + 78
23 ruby 0x0000000102274eef main + 79


Related issues 1 (0 open1 closed)

Related to Ruby master - Bug #9415: Strings#codepoints doesn't respect BOM on UTF-{16,32} pseudo encodingsClosednaruse (Yui NARUSE)01/15/2014Actions

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

It'd be related to that UTF-32 is a pseudo encoding, probably.

Updated by Hanmac (Hans Mackowiak) over 10 years ago

hm it maybe is ...
funny thing:

this works:
"äöü".encode("UTF-32BE") #=> "\u00E4\u00F6\u00FC"
"äöü".encode("UTF-32") #=> "\uFEFF\u00E4\u00F6\u00FC"
"äöü".encode("UTF-32LE") #=> "\u00E4\u00F6\u00FC" # << imo this should be wrong, or isnt there a difference between BE and LE ?
this not:
"äöü".encode("UTF-32LE") #=> "\u00E4\u00F6\u00FC"
"äöü".encode("UTF-32") #crash

PS: it also happens for UTF-16

Actions #3

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r43023.
Hans, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


encdb.c, utf_16_32.h: Unicode with BOM

  • enc/encdb.c, enc/utf_16_32.h (ENC_DUMMY_UNICODE): Unicode with BOM
    must be based on big endian variants, so that actual encodings would
    work. [ruby-core:57318] [Bug #8940]

Updated by nobu (Nobuyoshi Nakada) over 10 years ago

  • Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: REQUIRED, 2.0.0: REQUIRED

Updated by naruse (Yui NARUSE) over 10 years ago

  • Status changed from Closed to Assigned
  • Priority changed from 6 to Normal

r43033, r43034, and r43035 also looks related.

Note that though Unicode spec says non endian encoding should be Big Endian, actual world is often Little Endian.
Therefore don't guess its encoding if it doesn't have BOM.

Actions #6

Updated by Hanmac (Hans Mackowiak) about 10 years ago

the bug is still in 2.2trunk with UTF-16 & #inspect

s="\xFF\xFE"\x00i\x00d\x00"\x00|\x00"\x00s\x00y\x00s\x00t\x00e\x00m\x00_\x00c\x00o\x00d\x00e\x00"\x00|\x00"\x00a\x00s\x00s\x00e\x00m\x00b\x00l\x00y\x00_\x00c\x00o\x00d\x00e\x00"\x00|\x00"\x00d\x00e\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00"\x00|\x00"\x00c\x00r\x00e\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00"\x00|\x00"\x00u\x00p\x00d\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00"\x00\r\x00\n"
s.force_encoding("UTF-16")
/usr/local/lib/ruby/2.2.0/irb/inspector.rb:122: [BUG] Segmentation fault at 0x00000000000000
ruby 2.2.0dev (2014-01-12 trunk 44563) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0020 p:---- s:0072 e:000071 CFUNC :inspect
c:0019 p:0010 s:0069 e:000068 BLOCK /usr/local/lib/ruby/2.2.0/irb/inspector.rb:122 [FINISH]
c:0018 p:---- s:0066 e:000065 CFUNC :call
c:0017 p:0011 s:0062 e:000061 METHOD /usr/local/lib/ruby/2.2.0/irb/inspector.rb:115
c:0016 p:0012 s:0058 e:000057 METHOD /usr/local/lib/ruby/2.2.0/irb/context.rb:386
c:0015 p:0015 s:0055 e:000052 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:662
c:0014 p:0035 s:0050 e:000049 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:493
c:0013 p:0040 s:0042 e:000041 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:624
c:0012 p:0009 s:0037 e:000036 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:489
c:0011 p:0118 s:0033 e:000032 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:247 [FINISH]
c:0010 p:---- s:0030 e:000029 CFUNC :loop
c:0009 p:0007 s:0027 e:000026 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:233 [FINISH]
c:0008 p:---- s:0025 e:000024 CFUNC :catch
c:0007 p:0015 s:0021 e:000020 METHOD /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:232
c:0006 p:0030 s:0018 E:001858 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:488
c:0005 p:0008 s:0015 e:000014 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:397 [FINISH]
c:0004 p:---- s:0013 e:000012 CFUNC :catch
c:0003 p:0143 s:0009 E:000c58 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:396
c:0002 p:0021 s:0004 E:001608 EVAL /usr/local/bin/irb:15 [FINISH]
c:0001 p:0000 s:0002 E:001358 TOP [FINISH]

Actions #7

Updated by Hanmac (Hans Mackowiak) about 10 years ago

Issue #8940 has been updated by Hans Mackowiak.

the bug is still in 2.2trunk with UTF-16 & #inspect

s="\xFF\xFE"\x00i\x00d\x00"\x00|\x00"\x00s\x00y\x00s\x00t\x00e\x00m\x00_\x00c\x00o\x00d\x00e\x00"\x00|\x00"\x00a\x00s\x00s\x00e\x00m\x00b\x00l\x00y\x00_\x00c\x00o\x00d\x00e\x00"\x00|\x00"\x00d\x00e\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00"\x00|\x00"\x00c\x00r\x00e\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00"\x00|\x00"\x00u\x00p\x00d\x00a\x00t\x00e\x00d\x00_\x00a\x00t\x00"\x00\r\x00\n"
s.force_encoding("UTF-16")
/usr/local/lib/ruby/2.2.0/irb/inspector.rb:122: [BUG] Segmentation fault at 0x00000000000000
ruby 2.2.0dev (2014-01-12 trunk 44563) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0020 p:---- s:0072 e:000071 CFUNC :inspect
c:0019 p:0010 s:0069 e:000068 BLOCK /usr/local/lib/ruby/2.2.0/irb/inspector.rb:122 [FINISH]
c:0018 p:---- s:0066 e:000065 CFUNC :call
c:0017 p:0011 s:0062 e:000061 METHOD /usr/local/lib/ruby/2.2.0/irb/inspector.rb:115
c:0016 p:0012 s:0058 e:000057 METHOD /usr/local/lib/ruby/2.2.0/irb/context.rb:386
c:0015 p:0015 s:0055 e:000052 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:662
c:0014 p:0035 s:0050 e:000049 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:493
c:0013 p:0040 s:0042 e:000041 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:624
c:0012 p:0009 s:0037 e:000036 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:489
c:0011 p:0118 s:0033 e:000032 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:247 [FINISH]
c:0010 p:---- s:0030 e:000029 CFUNC :loop
c:0009 p:0007 s:0027 e:000026 BLOCK /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:233 [FINISH]
c:0008 p:---- s:0025 e:000024 CFUNC :catch
c:0007 p:0015 s:0021 e:000020 METHOD /usr/local/lib/ruby/2.2.0/irb/ruby-lex.rb:232
c:0006 p:0030 s:0018 E:001858 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:488
c:0005 p:0008 s:0015 e:000014 BLOCK /usr/local/lib/ruby/2.2.0/irb.rb:397 [FINISH]
c:0004 p:---- s:0013 e:000012 CFUNC :catch
c:0003 p:0143 s:0009 E:000c58 METHOD /usr/local/lib/ruby/2.2.0/irb.rb:396
c:0002 p:0021 s:0004 E:001608 EVAL /usr/local/bin/irb:15 [FINISH]
c:0001 p:0000 s:0002 E:001358 TOP [FINISH]


Bug #8940: printing UTF-32 crashs ruby
https://bugs.ruby-lang.org/issues/8940#change-44289

  • Author: Hans Mackowiak
  • Status: Assigned
  • Priority: Normal
  • Assignee:
  • Category:
  • Target version:
  • ruby -v: ruby 2.1.0dev (2013-09-23) [x86_64-darwin12.5.0]
  • Backport: 1.9.3: REQUIRED, 2.0.0: REQUIRED

using

p "äöü".encode("UTF-32")

does cause a SEGFAULT

-- C level backtrace information -------------------------------------------
0 libruby.2.1.0.dylib 0x00000001023f6679 rb_vm_bugreport + 137
1 libruby.2.1.0.dylib 0x00000001022bab1b report_bug + 283
2 libruby.2.1.0.dylib 0x00000001022ba9f4 rb_bug + 180
3 libruby.2.1.0.dylib 0x000000010237cc80 sigsegv + 144
4 libsystem_c.dylib 0x00007fff91d7d90a _sigtramp + 26
5 ??? 0x0000000000000000 0x0 + 0
6 libruby.2.1.0.dylib 0x00000001022b0045 rb_enc_precise_mbclen + 21
7 libruby.2.1.0.dylib 0x0000000102391cc8 rb_str_inspect + 968
8 libruby.2.1.0.dylib 0x00000001023f1e74 vm_call0_body + 2116
9 libruby.2.1.0.dylib 0x00000001023f1264 rb_call0 + 404
10 libruby.2.1.0.dylib 0x00000001023e7f15 rb_funcall + 261
11 libruby.2.1.0.dylib 0x0000000102312777 rb_inspect + 23
12 libruby.2.1.0.dylib 0x00000001022e663b rb_p + 11
13 libruby.2.1.0.dylib 0x00000001022f5b29 rb_f_p_internal + 57
14 libruby.2.1.0.dylib 0x00000001022c0b56 rb_ensure + 118
15 libruby.2.1.0.dylib 0x00000001022e9c9f rb_f_p + 31
16 libruby.2.1.0.dylib 0x00000001023f4baf vm_call_cfunc + 1007
17 libruby.2.1.0.dylib 0x00000001023f4528 vm_call_method + 840
18 libruby.2.1.0.dylib 0x00000001023deca7 vm_exec_core + 11591
19 libruby.2.1.0.dylib 0x00000001023eb4cd vm_exec + 109
20 libruby.2.1.0.dylib 0x00000001023ec2d8 rb_iseq_eval_main + 392
21 libruby.2.1.0.dylib 0x00000001022bfd69 ruby_exec_internal + 121
22 libruby.2.1.0.dylib 0x00000001022bfcae ruby_run_node + 78
23 ruby 0x0000000102274eef main + 79

--
http://bugs.ruby-lang.org/

Actions #8

Updated by bideshstr (bidesh mondal) about 10 years ago

By Mistake

Actions #9

Updated by bideshstr (bidesh mondal) about 10 years ago

By Mistake

Actions #10

Updated by nobu (Nobuyoshi Nakada) about 10 years ago

  • Status changed from Assigned to Closed

Applied in changeset r44605.


string.c: use actual encodings

  • string.c (get_actual_encoding): get actual encoding according to
    the BOM if exists.
  • string.c (rb_str_inspect): use according encoding, instead of
    pseudo encodings, UTF-{16,32}. [ruby-core:59757] [Bug #8940]
Actions #11

Updated by nobu (Nobuyoshi Nakada) about 10 years ago

  • Related to Bug #9415: Strings#codepoints doesn't respect BOM on UTF-{16,32} pseudo encodings added
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0