Bug #3407
closedKernel.open Ignores 'BOM|' Prefix of :encoding Value
Description
=begin
As reported in [ruby-core:30603]:
open('/tmp/bom', mode: ?w){|f| f << "\xEF\xBB\xBFfoo"}
[*open('/tmp/bom', encoding: 'BOM|utf-8').read.bytes]
=> [239, 187, 191, 102, 111, 111]
[*open('/tmp/bom', mode: 'r:BOM|utf-8').read.bytes]
=> [102, 111, 111]
[*open('/tmp/bom', 'r:BOM|utf-8').read.bytes]
=> [102, 111, 111]
=end
Updated by nobu (Nobuyoshi Nakada) over 14 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r28199.
Run Paint, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
Updated by runpaint (Run Paint Run Run) over 14 years ago
Much obliged. Is the following intended?
File.read('/tmp/bom', external_encoding: 'BOM|UTF-8')
#=> ArgumentError: unknown encoding name - BOM|UTF-8
(I also noticed that io_encname_bom_p()
appears to allow all 'UTF-'
encodings to be prefixed with 'BOM|'
, yet io_strip_bom()
doesn't strip the UTF-7 BOM. If I'm correct, an encoding of 'BOM|UTF-7'
should probably be forbidden rather than silently discarded.)
Updated by naruse (Yui NARUSE) over 14 years ago
File.read('/tmp/bom', external_encoding: 'BOM|UTF-8') #=> ArgumentError: unknown encoding name - BOM|UTF-8
Use IO.read('/tmp/bom', encoding: 'BOM|UTF-8')
.
It is not for encoding name, but mode_enc.
Updated by runpaint (Run Paint Run Run) over 14 years ago
I suppose so. It just seems to add more complexity to an already confusing process. The format of a mode string is:
-
'a'
or'r'
or'w'
- Optionally followed by
'+'
- Optionally followed by either
'b'
or't'
- Optionally followed by a colon, an optional
'BOM|'
(if the external encoding is Unicode, and ignoring the UTF-7 case), followed by an encoding name. - Optionally followed by another colon, then either another encoding name or hyphen.
Then, the :encoding
argument can take the value after the first colon in the mode string. The :internal_encoding
argument can take the value after the second colon in the mode string. However, the :external_encoding
argument takes the value between the two colons, but cannot have a 'BOM|'
prefix. (Further, the rdoc (io.c:6363) claims that, w.r.t. :external_encoding
, '-'
is a synonym for Encoding.default_external
, but this value raises an ArgumentError
). It's a lot to explain. The fewer special cases, the better, IMHO.