Bug #3181
closedPossible regexp regression in 1.9.1-p378
Description
=begin
Hi,
there seems to be some sort of regression in 1.9.1-p378 regarding regular expressions and I18N support.
Example program:
-- coding: utf-8 --¶
puts "Should match"
md = /\w/u.match("üäß")
p md.to_a
Example output:
jruby 1.5.0.RC1 (ruby 1.8.7 patchlevel 249) (2010-04-14 0b08bc7) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]
Should match at index 0
["\303\274"]
rubinius 1.0.0-rc4 (1.8.7 release 2010-03-31 JI) [x86_64-apple-darwin10.2.0]
Should match at index 0
["\303\274"]
ruby 1.8.6 (2010-02-05 patchlevel 399) [i686-darwin10.2.0]
Should match at index 0
["\303\274"]
ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin10.2.0]
Should match at index 0
["\303\274"]
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.2.0]
Should match at index 0
["ü"]
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
Should match at index 0
[]
ruby 1.9.2dev (2009-07-18 trunk 24186) [i386-darwin10.2.0]
Should match at index 0
["ü"]
Best regards,
Thomas
=end
Updated by naruse (Yui NARUSE) almost 14 years ago
- Category set to core
- Status changed from Open to Rejected
=begin
It is intended, \s/\d/\w is ASCII after 1.9.1p378 and 1.9.2.
(ruby 1.9.2dev (2009-07-18 trunk 24186) is too old, current trunk is ASCII)
This is because many codes which use \s and \d doesn't work on 1.9, so it was judged as a bug.
Anyway I'm interesting in real usage of \w in UTF-8 context, can you show the real example?
=end
Updated by rogerdpack (Roger Pack) over 13 years ago
=begin
Anyway I'm interesting in real usage of \w in UTF-8 context, can you show the real example?
Here's some related questions/uses, I believe:
http://stackoverflow.com/questions/3576232/how-to-match-unicode-words-with-ruby-1-9
http://www.ruby-forum.com/topic/208777
Though too late to change it now :)
=end