Project

General

Profile

Actions

Bug #2822

closed

Russian characters are missing from word characters types in Regexp

Added by stas (Stas Senotrusov) almost 15 years ago. Updated over 13 years ago.

Status:
Closed
Assignee:
-
Target version:
ruby -v:
ruby 1.9.2dev (2010-02-27 trunk 26772) [i686-linux]
Backport:
[ruby-core:28354]

Description

=begin
"Hello".match(/[\w]*/)
=> #<MatchData "Hello">

"Привет".match(/[\w]*/)
=> #<MatchData "">

"Привет".match(/[А-Яа-яЁё\w]*/)
=> #<MatchData "Привет">

Non word character type \W behaves similar.
=end

Actions #1

Updated by Eregon (Benoit Daloze) almost 15 years ago

=begin
$ ri Regexp
/\w/ - A word character ([a-zA-Z0-9_])

/[[:word:]]/ - A character in one of the following Unicode
general categories Letter, Mark, Number,
Connector_Punctuation<i/i>

/\p{Word}/ - A member of one of the following Unicode general
category Letter, Mark, Number, Connector_Punctuation

"aér".match /\w+/
=> #<MatchData "a">
"aér".match /[[:word:]]+/
=> #<MatchData "aér">
"aér".match /\p{Word}+/
=> #<MatchData "aér">

The documentation of Regexp is awesome in Ruby 1.9, have a look ;)
=end

Actions #2

Updated by naruse (Yui NARUSE) almost 15 years ago

  • Status changed from Open to Closed

=begin

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0