Project

General

Profile

Actions

Bug #3407

closed

Kernel.open Ignores 'BOM|' Prefix of :encoding Value

Bug #3407: Kernel.open Ignores 'BOM|' Prefix of :encoding Value

Added by runpaint (Run Paint Run Run) over 15 years ago. Updated over 14 years ago.

Status:
Closed
Assignee:
-
Target version:
ruby -v:
ruby 1.9.3dev (2010-06-01 trunk 28120) [i686-linux]
Backport:
[ruby-core:30641]

Description

=begin
As reported in [ruby-core:30603]:

open('/tmp/bom', mode: ?w){|f| f << "\xEF\xBB\xBFfoo"}
[*open('/tmp/bom', encoding: 'BOM|utf-8').read.bytes]
=> [239, 187, 191, 102, 111, 111]
[*open('/tmp/bom', mode: 'r:BOM|utf-8').read.bytes]
=> [102, 111, 111]
[*open('/tmp/bom', 'r:BOM|utf-8').read.bytes]
=> [102, 111, 111]
=end

Updated by nobu (Nobuyoshi Nakada) over 15 years ago Actions #1

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r28199.
Run Paint, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

Updated by runpaint (Run Paint Run Run) over 15 years ago Actions #2

Much obliged. Is the following intended?

File.read('/tmp/bom', external_encoding: 'BOM|UTF-8') 
#=> ArgumentError: unknown encoding name - BOM|UTF-8

(I also noticed that io_encname_bom_p() appears to allow all 'UTF-' encodings to be prefixed with 'BOM|', yet io_strip_bom() doesn't strip the UTF-7 BOM. If I'm correct, an encoding of 'BOM|UTF-7' should probably be forbidden rather than silently discarded.)

Updated by naruse (Yui NARUSE) over 15 years ago Actions #3

File.read('/tmp/bom', external_encoding: 'BOM|UTF-8') 
#=> ArgumentError: unknown encoding name - BOM|UTF-8

Use IO.read('/tmp/bom', encoding: 'BOM|UTF-8').
It is not for encoding name, but mode_enc.

Updated by runpaint (Run Paint Run Run) over 15 years ago Actions #4

I suppose so. It just seems to add more complexity to an already confusing process. The format of a mode string is:

  • 'a' or 'r' or 'w'
  • Optionally followed by '+'
  • Optionally followed by either 'b' or 't'
  • Optionally followed by a colon, an optional 'BOM|' (if the external encoding is Unicode, and ignoring the UTF-7 case), followed by an encoding name.
  • Optionally followed by another colon, then either another encoding name or hyphen.

Then, the :encoding argument can take the value after the first colon in the mode string. The :internal_encoding argument can take the value after the second colon in the mode string. However, the :external_encoding argument takes the value between the two colons, but cannot have a 'BOM|' prefix. (Further, the rdoc (io.c:6363) claims that, w.r.t. :external_encoding, '-' is a synonym for Encoding.default_external, but this value raises an ArgumentError). It's a lot to explain. The fewer special cases, the better, IMHO.

Actions

Also available in: PDF Atom