Bug #3202
closedpotential regression? \w in regex doesn't match umlauts anymore.
Description
=begin
I'm trying to match umlauts using \w in regular expressions. In 1.9.1-p243, this works:
$ cat bar.rb
encoding: utf-8¶
puts "ä".encoding
puts /\w/u.encoding
puts ("ä" =~ /\w/u).inspect
$ ruby bar.rb
UTF-8
UTF-8
0
$ ruby --version
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.2.0]
With p378, it doesn't match the a with diaeresis anymore:
$ ruby bar.rb
UTF-8
UTF-8
nil
$ ruby --version
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
I'm seeing the same result in 1.9.2dev (2010-04-26 trunk 27503).
This is OS X 10.6, with the following locale settings:
$ locale
LANG="C"
LC_COLLATE="C"
LC_CTYPE="de_AT.UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
No setting of either LC_CTYPE, LANG, nor LC_ALL has any effect on the p378 result.
This unexpected difference in behavior leads me to believe that something changed for the worse between these two releases.
=end
Updated by mame (Yusuke Endoh) almost 14 years ago
- Status changed from Open to Rejected
=begin
Hi,
This is intended spec change.
See http://redmine.ruby-lang.org/issues/show/3181.
Thanks,
--
Yusuke Endoh mame@tsg.ne.jp
=end