Bug #8289
closed[].join.encoding # => US-ASCII (I expect also UTF-8
Description
May be related to http://bugs.ruby-lang.org/issues/5379
$ date
Thu Apr 18 23:56:54 CEST 2013
$ rvm get stable
$ rvm install ruby-head
... long compile process ...
$ rvm use ruby-head
Using /Users/peter_v/.rvm/gems/ruby-head
$ ruby -v
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
$ cat empty_array_join_returns_ASCII_encoding.rb
puts ["abc"].join.encoding
puts [].join.encoding
puts [].join.encode("utf-8").encoding
Actual result:¶
$ ruby -v empty_array_join_returns_ASCII_encoding.rb
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
UTF-8
US-ASCII
UTF-8
Expected result¶
$ ruby -v empty_array_join_returns_ASCII_encoding.rb
ruby 2.1.0dev (2013-04-19) [x86_64-darwin12.3.0]
UTF-8
UTF-8 # This is edited for expected result (not the actual result)
UTF-8
I would expect that in Ruby 2.0 with UTF-8 as default encoding,
the returned encoding of an array (with default encoding strings),
is always UTF-8, independent of the size of the array.
The current behaviour breaks my tests for an output encoding of
UTF-8 in the case the array is empty.
My work around is array.join().encode("utf-8")
which works, but is ugly.
Updated by naruse (Yui NARUSE) over 11 years ago
- Status changed from Open to Rejected
It is intended.
Strings always generated as an ASCII only string has US-ASCII encoding.
It shall not cause any meaningful side effects.
Updated by khalil_fazal (Khalil Fazal) almost 8 years ago
A work around for my own projects:
class Array
alias_method :old_join, :join
# A work around for https://bugs.ruby-lang.org/issues/8289
def join(separator = $,)
'' + old_join(separator)
end
end
puts ["abc"].join.encoding
puts [].join.encoding
puts [].join.encode("utf-8").encoding
Actual result:
UTF-8
UTF-8
UTF-8
as expected.
I'm posting in this old bug report for future readers.
I do not expect this change to be merged into ruby master.
Updated by naruse (Yui NARUSE) almost 8 years ago
- Backport deleted (
1.9.3: UNKNOWN, 2.0.0: UNKNOWN)
Khalil Fazal wrote:
puts [].join.encode("utf-8").encoding
puts [].join.force_encoding("utf-8").encoding
is correct because of both semantic and performance.
String#force_encoding just overwrite the encoding of string instead of String#encode which is encoding conversion.