Bug #1943
closedunexpected behavior of tr with unicode strings
Description
=begin
The unicode code point 8221 is to be replaced by 34, and not 43
wide = [12288, 65288, 65289, 65291, 12540, 8221]
ascii = [32, 40, 41, 43, 45, 34]
foo = [8221, 19997, 37329, 22825, 20351, 8221]
bar = foo.pack('U*').tr(wide.pack('U*'), ascii.pack('U*'))
bar.unpack('U*')
=> [43, 19997, 37329, 22825, 20351, 43]
It works correctly in this example:
[8221].pack('U*').tr([8221].pack('U*'), [34].pack('U*')).unpack('U*')
=> [34]
Why, I don't know.
=end
Updated by mame (Yusuke Endoh) over 15 years ago
- Status changed from Open to Rejected
=begin
Not a bug.
By design, String#tr handles some meta characters including '-' (ASCII 45).
What you are doing is similar to:
"F@@@@F".tr("ABCDEF", "abcd-a") #=> "d@@@@d"
"F".tr("F", "a") #=> "a@@@@a"
You should use escape character:
"F@@@@F".tr("ABCDEF", "abcd-a") #=> "a@@@@a"
BTW, in a discussion with nurse, we noticed that empty range (such as "d-a")
seems to cause unexpected behavior. I'll register a separate ticket.
--
Yusuke ENDOH mame@tsg.ne.jp
=end