Bug #15718
closedYAML raises error when dumping strings with UTF32 encoding
Description
ruby -r yaml -e "p YAML.dump( ''.force_encoding('UTF-32LE') )"
Traceback (most recent call last):
4: from -e:1:in `<main>'
3: from /Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych.rb:513:in `dump'
2: from /Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych/visitors/yaml_tree.rb:118:in `push'
1: from /Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych/visitors/yaml_tree.rb:136:in `accept'
/Users/work/.rvm/rubies/ruby-2.6.1/lib/ruby/2.6.0/psych/visitors/yaml_tree.rb:298:in `visit_String': incompatible encoding regexp match (US-ASCII regexp with UTF-32LE string) (Encoding::CompatibilityError)
Surprisingly, this works in Ruby 2.4.x, but not in 2.2, 2.3, 2.5 nor 2.6!
Files
Updated by nobu (Nobuyoshi Nakada) about 5 years ago
It may be related to a code range bug.
By adding o.valid_encoding?
to Psych::Visitors::YAMLTree#visit_String
, the error raises in ruby 24 too.
Updated by rubenochiavone (Ruben Chiavone) about 5 years ago
Since it relates to mismatch of regex and YAML text encoding a possible fix is to only attempt to match the text when encoding matches or when text encoding is ascii_compatible?
. WDYT?
Still I'm not sure why on other versions it works.
Anyhow, I'm adding a patch that reproduces and fixes this issues (hopefully).
Updated by marcandre (Marc-Andre Lafortune) about 5 years ago
rubenochiavone (Ruben Chiavone) wrote:
Since it relates to mismatch of regex and YAML text encoding a possible fix is to only attempt to match the text when encoding matches or when text encoding is
ascii_compatible?
. WDYT?
What about:
YAML.dump("Hello\nWorld".encode('UTF-32LE'))
or other strings like "123"
that need special formatting?
Updated by rubenochiavone (Ruben Chiavone) about 5 years ago
I see. There are other regexp based code similar to what Psych::Visitors::YAMLTree.visit_String
does. Not sure if testing for encoding before matching as I initially proposed is the way to go. What else do you suggest that could be a fix? Maybe convert it to US_ASCII
or skip non-US_ASCII
text altogether?
Updated by jeremyevans0 (Jeremy Evans) about 3 years ago
- Status changed from Open to Feedback
- Assignee set to tenderlovemaking (Aaron Patterson)
I looked into this and ruby YAML uses libyaml, which is a YAML 1.1 implementation. YAML 1.1 does not support UTF-32 encoding; that isn't supported by the YAML spec until YAML 1.2. So I think it is reasonable for YAML.dump
to raise Encoding::CompatibilityError
for UTF-32 data, and I don't consider this a bug. Assigning to @tenderlovemaking (Aaron Patterson) to make a decision on whether YAML.dump should handle this.
YAML 1.2 is not backwards compatible with YAML 1.1, so I don't think it would be reasonable to switch the YAML library from libyaml to a different library that supports YAML 1.2. I'm not aware of an existing Ruby library that implements YAML 1.2.
Updated by marcandre (Marc-Andre Lafortune) about 3 years ago
Thanks for the investigation @jeremyevans0 (Jeremy Evans) 👍
I'm definitely ok to close this. I don't even recall how I stumbled upon this 😅
Updated by jeremyevans0 (Jeremy Evans) about 3 years ago
- Status changed from Feedback to Closed