Project

General

Profile

Actions

Bug #3202

closed

potential regression? \w in regex doesn't match umlauts anymore.

Added by antifuchs (Andreas Fuchs) almost 14 years ago. Updated almost 13 years ago.

Status:
Rejected
Assignee:
-
ruby -v:
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]
[ruby-core:29792]

Description

=begin
I'm trying to match umlauts using \w in regular expressions. In 1.9.1-p243, this works:

$ cat bar.rb

encoding: utf-8

puts "ä".encoding
puts /\w/u.encoding
puts ("ä" =~ /\w/u).inspect
$ ruby bar.rb
UTF-8
UTF-8
0
$ ruby --version
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin10.2.0]

With p378, it doesn't match the a with diaeresis anymore:

$ ruby bar.rb
UTF-8
UTF-8
nil
$ ruby --version
ruby 1.9.1p378 (2010-01-10 revision 26273) [i386-darwin10.2.0]

I'm seeing the same result in 1.9.2dev (2010-04-26 trunk 27503).

This is OS X 10.6, with the following locale settings:
$ locale
LANG="C"
LC_COLLATE="C"
LC_CTYPE="de_AT.UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

No setting of either LC_CTYPE, LANG, nor LC_ALL has any effect on the p378 result.

This unexpected difference in behavior leads me to believe that something changed for the worse between these two releases.
=end


Related issues 1 (0 open1 closed)

Is duplicate of Backport191 - Bug #3181: Possible regexp regression in 1.9.1-p378Rejected04/20/2010Actions
Actions #1

Updated by mame (Yusuke Endoh) almost 14 years ago

  • Status changed from Open to Rejected

=begin
Hi,

This is intended spec change.
See http://redmine.ruby-lang.org/issues/show/3181.

Thanks,

--
Yusuke Endoh
=end

Actions

Also available in: Atom PDF

Like0
Like0