Update Oniguruma for Unicode 6

Unicode 6.0 has been released, so it would be preferable to update Oniguruma for Ruby 1.9.3. A patch is attached which updates enc/unicode/unicode/name2ctype.kwd, and fixes the instructions in tool/enc-unicode.rb. Given the scope of this update, I'd prefer this merged sooner rather than later so we have time to test it.

A tentative test follows:

# U+20B9, INDIAN RUPEE SIGN, is a new codepoint in 6.0. It has the general category Sc.
/\p{sc}/u =~ "\u{20B9}" #=> 0
# U+0B72, ORIYA FRACTION ONE QUARTER, is a new codepoint in the Oriya script.

/\p{oriya}/u =~ "\u{0b72}" #=> 0
# U+0B77, ORIYA FRACTION THREE SIXTEENTHS, is the last codepoint in the Oriya script block.
/\p{oriya}/u =~ "\u{0b78}" #=> nil
# U+FBC0, TAMIL VOWEL SIGN II, is a new codepoint in the Arabic Presentation Forms-A block.
/\p{arabic}/u =~ "\u{fbc0}" #=> 0
# U+1F130, SQUARED LATIN CAPITAL LETTER A, is a new codepoint with a general category of So.
/\p{so}/u =~ "\u{1f130}" #=> 0
# U+0847, MANDAIC LETTER IT, is a new codepoint in the Mandaic script. The Mandaic script is new.
/\p{mandaic}/u =~ "\u{0847}" #=> 0

(When this is complete, NEWS needs to be updated, too).


Updated by runpaint (Run Paint Run Run) almost 11 years ago

I'm working on a test suite for this. So far, I've found some scripts that don't appear to recognised (may also exist in trunk; I haven't checked yet):

1) (0xa840..0xa877).select{|o| o.chr('utf-8') =~ /\p{Phag}/} #=> []
2) (0x1c00..0x1c23).select{|o| o.chr('utf-8') =~ /\p{Lepcha}/} #=> []
3) (0x103a0..0x103c3).select{|o| o.chr('utf-8') =~ /\p{xpeo}/} #=> []
4) The "Unknown" script, defined in, isn't handled by enc-unicode.rb. /\p{Zzzz}/, for example, raises a SyntaxError.


Updated by naruse (Yui NARUSE) almost 11 years ago

