Project

General

Profile

Actions

Bug #5685

closed

Oniguruma does not recognize U+30FC as Katakana

Added by jpatokal (Jani Patokallio) about 13 years ago. Updated about 13 years ago.

Status:
Rejected
Assignee:
-
Target version:
ruby -v:
ruby 1.9.3dev (2011-09-23 revision 33323) [x86_64-darwin10.8.0]
Backport:
[ruby-core:41386]

Description

The character U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK (Japanese choonpu) belongs to the Unicode Katakana block (U+30A0-30FF), but it is not matched by /\p{Katakana}/. Demonstration:

"私のホバークラフトは鰻でいっぱいです".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, 'X')
=> "XーX"

In other words, all kana and kanji in that string except U+30FC are matched. And it really is 30FC/12540:

"私のホバークラフトは鰻でいっぱいです".gsub(/(\p{Katakana}|\p{Hiragana}|\p{Han})+/, '').unpack("U*")
=> [12540]

Also occurs in Ruby 1.8 with the Oniguruma library.

Actions

Also available in: Atom PDF

Like0
Like0