Bug #13292
closedInvalid encodings in UTF-32
Description
Ruby is very strict about valid UTF-8 encodings, which is great.
Strings that encode surrogates or too large codepoints are not valid.
However, in UTF-32, it is possible to encode such values, and Ruby treats them as valid:
Example 1 (too large value)
a = [0, 0, 17, 0].pack("C*").force_encoding("UTF-32LE") #=> "\u{110000}"
a.valid_encoding? # => true
Example 2 (surrogate)
b = [0, 216, 0, 0].pack("C*").force_encoding("UTF-32LE") # => "\uD800"
b.valid_encoding? #=> true
The behaviour should be changed to String#valid_encoding?
reporting false
For reference: http://unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf (page 71)
Updated by nobu (Nobuyoshi Nakada) about 7 years ago
- Status changed from Open to Closed
Applied in changeset r57816.
fix UTF-32 valid_encoding?
-
enc/utf_32be.c (utf32be_mbc_enc_len): check arguments precisely.
[ruby-core:79966] [Bug #13292] -
enc/utf_32le.c (utf32le_mbc_enc_len): ditto.
-
regenc.h (UNICODE_VALID_CODEPOINT_P): predicate for valid
Unicode codepoints.
Updated by naruse (Yui NARUSE) about 7 years ago
- Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN to 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: DONE
ruby_2_4 r57935 merged revision(s) 57816,57817.
Updated by nagachika (Tomoyuki Chikanaga) about 7 years ago
- Backport changed from 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: DONE to 2.2: REQUIRED, 2.3: REQUIRED, 2.4: DONE
Updated by usa (Usaku NAKAMURA) about 7 years ago
- Backport changed from 2.2: REQUIRED, 2.3: REQUIRED, 2.4: DONE to 2.2: DONE, 2.3: REQUIRED, 2.4: DONE
ruby_2_2 r58103 merged revision(s) 57816,57817.
Updated by nagachika (Tomoyuki Chikanaga) about 7 years ago
- Backport changed from 2.2: DONE, 2.3: REQUIRED, 2.4: DONE to 2.2: DONE, 2.3: DONE, 2.4: DONE
ruby_2_3 r58183 merged revision(s) 57816,57817.