Project

General

Profile

Actions

Bug #3407

closed

Kernel.open Ignores 'BOM|' Prefix of :encoding Value

Added by runpaint (Run Paint Run Run) over 14 years ago. Updated over 13 years ago.

Status:
Closed
Assignee:
-
Target version:
ruby -v:
ruby 1.9.3dev (2010-06-01 trunk 28120) [i686-linux]
Backport:
[ruby-core:30641]

Description

=begin
As reported in [ruby-core:30603]:

open('/tmp/bom', mode: ?w){|f| f << "\xEF\xBB\xBFfoo"}
[*open('/tmp/bom', encoding: 'BOM|utf-8').read.bytes]
=> [239, 187, 191, 102, 111, 111]
[*open('/tmp/bom', mode: 'r:BOM|utf-8').read.bytes]
=> [102, 111, 111]
[*open('/tmp/bom', 'r:BOM|utf-8').read.bytes]
=> [102, 111, 111]
=end

Actions #1

Updated by nobu (Nobuyoshi Nakada) over 14 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r28199.
Run Paint, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

Actions #2

Updated by runpaint (Run Paint Run Run) over 14 years ago

Much obliged. Is the following intended?

File.read('/tmp/bom', external_encoding: 'BOM|UTF-8') 
#=> ArgumentError: unknown encoding name - BOM|UTF-8

(I also noticed that io_encname_bom_p() appears to allow all 'UTF-' encodings to be prefixed with 'BOM|', yet io_strip_bom() doesn't strip the UTF-7 BOM. If I'm correct, an encoding of 'BOM|UTF-7' should probably be forbidden rather than silently discarded.)

Actions #3

Updated by naruse (Yui NARUSE) over 14 years ago

File.read('/tmp/bom', external_encoding: 'BOM|UTF-8') 
#=> ArgumentError: unknown encoding name - BOM|UTF-8

Use IO.read('/tmp/bom', encoding: 'BOM|UTF-8').
It is not for encoding name, but mode_enc.

Actions #4

Updated by runpaint (Run Paint Run Run) over 14 years ago

I suppose so. It just seems to add more complexity to an already confusing process. The format of a mode string is:

  • 'a' or 'r' or 'w'
  • Optionally followed by '+'
  • Optionally followed by either 'b' or 't'
  • Optionally followed by a colon, an optional 'BOM|' (if the external encoding is Unicode, and ignoring the UTF-7 case), followed by an encoding name.
  • Optionally followed by another colon, then either another encoding name or hyphen.

Then, the :encoding argument can take the value after the first colon in the mode string. The :internal_encoding argument can take the value after the second colon in the mode string. However, the :external_encoding argument takes the value between the two colons, but cannot have a 'BOM|' prefix. (Further, the rdoc (io.c:6363) claims that, w.r.t. :external_encoding, '-' is a synonym for Encoding.default_external, but this value raises an ArgumentError). It's a lot to explain. The fewer special cases, the better, IMHO.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0