Project

General

Profile

Bug #15718

YAML raises error when dumping strings with UTF32 encoding

Added by marcandre (Marc-Andre Lafortune) 7 months ago. Updated 7 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:91903]

Description

ruby -r yaml -e "p YAML.dump( ''.force_encoding('UTF-32LE') )"

Traceback (most recent call last):
    4: from -e:1:in `<main>'
    3: from /Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych.rb:513:in `dump'
    2: from /Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych/visitors/yaml_tree.rb:118:in `push'
    1: from /Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych/visitors/yaml_tree.rb:136:in `accept'
/Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych/visitors/yaml_tree.rb:298:in `visit_String': incompatible encoding regexp match (US-ASCII regexp with UTF-32LE string) (Encoding::CompatibilityError)

Surprisingly, this works in Ruby 2.4.x, but not in 2.2, 2.3, 2.5 nor 2.6!


Files

yamldumputf32encodingerror.patch (2.55 KB) yamldumputf32encodingerror.patch rubenochiavone (Ruben Chiavone), 03/21/2019 02:11 PM

History

Updated by nobu (Nobuyoshi Nakada) 7 months ago

It may be related to a code range bug.
By adding o.valid_encoding? to Psych::Visitors::YAMLTree#visit_String, the error raises in ruby 24 too.

Updated by rubenochiavone (Ruben Chiavone) 7 months ago

Since it relates to mismatch of regex and YAML text encoding a possible fix is to only attempt to match the text when encoding matches or when text encoding is ascii_compatible?. WDYT?

Still I'm not sure why on other versions it works.

Anyhow, I'm adding a patch that reproduces and fixes this issues (hopefully).

Updated by marcandre (Marc-Andre Lafortune) 7 months ago

rubenochiavone (Ruben Chiavone) wrote:

Since it relates to mismatch of regex and YAML text encoding a possible fix is to only attempt to match the text when encoding matches or when text encoding is ascii_compatible?. WDYT?

What about:

YAML.dump("Hello\nWorld".encode('UTF-32LE'))

or other strings like "123" that need special formatting?

Updated by rubenochiavone (Ruben Chiavone) 7 months ago

I see. There are other regexp based code similar to what Psych::Visitors::YAMLTree.visit_String does. Not sure if testing for encoding before matching as I initially proposed is the way to go. What else do you suggest that could be a fix? Maybe convert it to US_ASCII or skip non-US_ASCII text altogether?

Also available in: Atom PDF