Project

General

Profile

Bug #4758

yaml file not human readable when saving utf-8

Added by lazaridis.com (Lazaridis Ilias) almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Target version:
-
ruby -v:
-
Backport:
[ruby-core:36374]

Description

On a fresh ruby installation, I've stored some data within a yaml file.

The data does arrive there as "\x9B\xA6\xA1\xA0\xA3\xE3", thus I'm not able to edit something there.

I file this as a "Bug", because yaml is meant to be human-readable.

=== Workaround ===

within some discussions, the following workaround came up:

require "psych" // require before yaml
require "yaml"

But this is not always achievable, e.g. when yaml is used by a library etc.

=== Insider Context ===

Backward compatibility can be achieved easily by:

YAML::ENGINE.yamler = "syck"

=== Newcomer Context ===

Ruby should work "out of the box" correct with utf-8 data, an thus "psych" should become the default.

As said, if you view this issue strictly, it's a defect/bug.

(I've personally lost some hours with this issue)


Files

noname (500 Bytes) noname tenderlovemaking (Aaron Patterson), 05/24/2011 03:23 AM

History

Updated by naruse (Yui NARUSE) almost 8 years ago

  • Status changed from Open to Assigned
  • Assignee set to tenderlovemaking (Aaron Patterson)

Updated by drbrain (Eric Hodel) almost 8 years ago

This is the YAML spec, it is not a bug of ruby. See: http://www.yaml.org/spec/1.2/spec.html

Updated by tenderlovemaking (Aaron Patterson) almost 8 years ago

  • ruby -v changed from ruby 1.9.2p180 (2011-02-18) [i386-mingw32] to -

On Tue, May 24, 2011 at 02:51:16AM +0900, Eric Hodel wrote:

Issue #4758 has been updated by Eric Hodel.

This is the YAML spec, it is not a bug of ruby. See: http://www.yaml.org/spec/1.2/spec.html

Yes, it is YAML spec. However, if it's a valid UTF-8 string, I think it
should be output as that UTF-8 string.

For example:

# encoding: utf-8

require 'yaml'
require 'psych'

p Psych.dump({ :hello => 'こんにちは!'})
p YAML.dump({ :hello => 'こんにちは!'})

The results are:

"---\n:hello: こんにちは!\n"
"--- \n:hello:
\"\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1\xE3\x81\xAF\xEF\xBC\x81\"\n"

Which seems like unexpected behavior of syck to me.

To fix this, I will make Psych default for 1.9.3.

--
Aaron Patterson
http://tenderlovemaking.com/

Updated by lazaridis.com (Lazaridis Ilias) almost 8 years ago

Aaron Patterson wrote:
[...]

Yes, it is YAML spec. However, if it's a valid UTF-8 string, I think it
should be output as that UTF-8 string.
[...]

Yes, you're right, it should:

The YAML specs have "easily readable by humans" as the top priority design goal:

1.1. Goals
The design goals for YAML are, in decreasing priority:
1. YAML is easily readable by humans.

http://www.yaml.org/spec/1.2/spec.html

.

Updated by tenderlovemaking (Aaron Patterson) almost 8 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 0 to 100

I've fixed this in r31715.

Also available in: Atom PDF