Since it relates to mismatch of regex and YAML text encoding a possible fix is to only attempt to match the text when encoding matches or when text encoding is ascii_compatible?. WDYT?
Still I'm not sure why on other versions it works.
Anyhow, I'm adding a patch that reproduces and fixes this issues (hopefully).
Since it relates to mismatch of regex and YAML text encoding a possible fix is to only attempt to match the text when encoding matches or when text encoding is ascii_compatible?. WDYT?
What about:
YAML.dump("Hello\nWorld".encode('UTF-32LE'))
or other strings like "123" that need special formatting?
I see. There are other regexp based code similar to what Psych::Visitors::YAMLTree.visit_String does. Not sure if testing for encoding before matching as I initially proposed is the way to go. What else do you suggest that could be a fix? Maybe convert it to US_ASCII or skip non-US_ASCII text altogether?
Assignee set to tenderlovemaking (Aaron Patterson)
I looked into this and ruby YAML uses libyaml, which is a YAML 1.1 implementation. YAML 1.1 does not support UTF-32 encoding; that isn't supported by the YAML spec until YAML 1.2. So I think it is reasonable for YAML.dump to raise Encoding::CompatibilityError for UTF-32 data, and I don't consider this a bug. Assigning to @tenderlovemaking (Aaron Patterson) to make a decision on whether YAML.dump should handle this.
YAML 1.2 is not backwards compatible with YAML 1.1, so I don't think it would be reasonable to switch the YAML library from libyaml to a different library that supports YAML 1.2. I'm not aware of an existing Ruby library that implements YAML 1.2.