Bug #178 [ruby-core:17310]
File.open on sprintf-formatted string fails with encoding conversion error on OS X
| Status : | Closed | Start : | 06/18/2008 | |
| Priority : | Normal | Due date : | ||
| Assigned to : | Yui NARUSE | % Done : | 100% |
|
| Category : | - | |||
| Target version : | - | |||
| ruby -v : |
Description
String#% and File.open are interacting strangely on OS X, so files opened with a sprintf formatted string raise an ArgumentError:
$ ruby19 -vwe 'File.new("foo" % [])'
ruby 1.9.0 (2008-06-18 revision 15873) [i686-darwin9.3.0]
-e:1:in `initialize': transcoding not supported (from US-ASCII to UTF8-MAC) (ArgumentError)
from -e:1:in `new'
from -e:1:in `<main>'
Using just "foo" as the filename works fine:
$ ruby19 -we 'File.new("foo")'
As does String#<<:
$ ruby19 -we 'File.new("foo" << "")'
History
06/18/2008 01:41 PM - Anonymous
I'm not sure why UTF8-MAC was introduced. UTF8-MAC indeed isn't supported currently for transcoding. I don't even know what UTF8-MAC is. It is defined as a replica of UTF-8 in enc/utf_8.c. It is not defined at http://www.iana.org/assignments/character-sets. It may be that it is an attempt to refer to the fact that UTF-8 usually is used in decomposed form (NFD) on the Mac. But that would not be relevant for opening a file, because the Mac OS accepts any kind of normalization, and converts to NFD by itself (similar to a file system that accepts both upper- and lower-case, but internally uses only one case). Also, the issues of normalization is orthogonal to what kind of encoding form is used for Unicode, and therefore adding it to an encoding is something that we should consider much more carefully. Overall, UTF-8 should be UTF-8, it's a bad idea to create variants. Regards, Martin. At 09:42 08/06/18, Eric Hodel wrote: >Issue#178has been reported by Eric Hodel. > >---------------------------------------- >Bug#178: File.open on sprintf-formatted string fails with encoding >conversion error on OS X >http://redmine.ruby-lang.org/issues/show/178 > >Author: Eric Hodel >Status: Open >Priority: Normal >Assigned to: >Category: >Target version: > > >String#% and File.open are interacting strangely on OS X, so files opened >with a sprintf formatted string raise an ArgumentError: > >$ ruby19 -vwe 'File.new("foo" % [])' >ruby 1.9.0 (2008-06-18 revision 15873) [i686-darwin9.3.0] >-e:1:in `initialize': transcoding not supported (from US-ASCII to UTF8-MAC) >(ArgumentError) > from -e:1:in `new' > from -e:1:in `<main>' > >Using just "foo" as the filename works fine: > >$ ruby19 -we 'File.new("foo")' > >As does String#<<: > >$ ruby19 -we 'File.new("foo" << "")' > > >---------------------------------------- >You have received this notification because you have either subscribed to >it, or are involved in it. >To change your notification preferences, please click here: >http://redmine.ruby-lang.org/my/account #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
06/18/2008 03:37 PM - Yui NARUSE
- Status changed from Open to Closed
- Assigned to set to Yui NARUSE
- % Done changed from 0 to 100
This problem is from the same bug of Bug#179, and it was fixed at r17403. Thanks, > It may be that it is an attempt to refer to the fact that UTF-8 > usually is used in decomposed form (NFD) on the Mac. But that > would not be relevant for opening a file, because the Mac OS > accepts any kind of normalization, and converts to NFD by itself > (similar to a file system that accepts both upper- and lower-case, > but internally uses only one case). Yeah, that's true when you write to filesystem, but when you read from filesystem you may want to know whether they are composed or decomposed.
06/18/2008 05:33 PM - Anonymous
At 15:36 08/06/18, Yui NARUSE wrote: >Issue#178has been updated by Yui NARUSE. > >Status changed from Open to Closed >Assigned to set to Yui NARUSE >% Done changed from 0 to 100 > >This problem is from the same bug of Bug#179, >and it was fixed at r17403. >Thanks, Great, thanks! >> It may be that it is an attempt to refer to the fact that UTF-8 >> usually is used in decomposed form (NFD) on the Mac. But that >> would not be relevant for opening a file, because the Mac OS >> accepts any kind of normalization, and converts to NFD by itself >> (similar to a file system that accepts both upper- and lower-case, >> but internally uses only one case). > >Yeah, that's true when you write to filesystem, >but when you read from filesystem you may want to know >whether they are composed or decomposed. That may indeed be the case. But this really only applies to filenames (and maybe similar names of resources) on the Mac. For such a small subset of data, I think it's overkill if as a consequence, processing together with other data is blocked (as we saw in the bug report). As far as I understand, it doesn't apply to file contents or other data on the Mac. Also, as soon as you concatenate two strings, there is no guarantee that NFD is kept (unless of course you implement separate string concatenation for this specific encoding). In general, the best thing to do if you want to know is to check, and the best thing if you want to be sure is to check, and then to change if necessary. But we still have to implement this (maybe for -3?). [Also, if the meaning of UTF8-MAC is really NFD, it might be better to actually call it that way.] Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
06/19/2008 01:45 AM - Yui NARUSE
Martin Duerst wrote: >>> It may be that it is an attempt to refer to the fact that UTF-8 >>> usually is used in decomposed form (NFD) on the Mac. But that >>> would not be relevant for opening a file, because the Mac OS >>> accepts any kind of normalization, and converts to NFD by itself >>> (similar to a file system that accepts both upper- and lower-case, >>> but internally uses only one case). >> Yeah, that's true when you write to filesystem, >> but when you read from filesystem you may want to know >> whether they are composed or decomposed. > > That may indeed be the case. But this really only applies to > filenames (and maybe similar names of resources) on the Mac. > For such a small subset of data, I think it's overkill if > as a consequence, processing together with other data is > blocked (as we saw in the bug report). This bug is derived from other point. > As far as I understand, it doesn't apply to file contents or > other data on the Mac. Also, as soon as you concatenate two > strings, there is no guarantee that NFD is kept (unless of > course you implement separate string concatenation for this > specific encoding). In general, the best thing to do if you > want to know is to check, and the best thing if you want to > be sure is to check, and then to change if necessary. But we > still have to implement this (maybe for -3?). Off cource, the encoding of other data on the mac may be other than UTF8-MAC: that's may be composed UTF-8. I intend that strings labeld as UTF8-MAC may needed to be converted or normalized. If you don't care about it, you can use force_encoding. > [Also, if the meaning of UTF8-MAC is really NFD, it might > be better to actually call it that way.] not real NFD, Apple's NFD as I commented in enc/utf_8.c. -- NARUSE, Yui <naruse@airemix.jp>