Project

General

Profile

Actions

Feature #3989

closed

Update Oniguruma for Unicode 6

Added by runpaint (Run Paint Run Run) almost 11 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-

Description

=begin
Unicode 6.0 has been released, so it would be preferable to update Oniguruma for Ruby 1.9.3. A patch is attached which updates enc/unicode/unicode/name2ctype.kwd, and fixes the instructions in tool/enc-unicode.rb. Given the scope of this update, I'd prefer this merged sooner rather than later so we have time to test it.

A tentative test follows:

# U+20B9, INDIAN RUPEE SIGN, is a new codepoint in 6.0. It has the general category Sc.
/\p{sc}/u =~ "\u{20B9}" #=> 0
# U+0B72, ORIYA FRACTION ONE QUARTER, is a new codepoint in the Oriya script.

/\p{oriya}/u =~ "\u{0b72}" #=> 0
# U+0B77, ORIYA FRACTION THREE SIXTEENTHS, is the last codepoint in the Oriya script block.
/\p{oriya}/u =~ "\u{0b78}" #=> nil
# U+FBC0, TAMIL VOWEL SIGN II, is a new codepoint in the Arabic Presentation Forms-A block.
/\p{arabic}/u =~ "\u{fbc0}" #=> 0
# U+1F130, SQUARED LATIN CAPITAL LETTER A, is a new codepoint with a general category of So.
/\p{so}/u =~ "\u{1f130}" #=> 0
# U+0847, MANDAIC LETTER IT, is a new codepoint in the Mandaic script. The Mandaic script is new.
/\p{mandaic}/u =~ "\u{0847}" #=> 0

(When this is complete, NEWS needs to be updated, too).
=end


Files

onig-u6.patch (731 KB) onig-u6.patch runpaint (Run Paint Run Run), 10/27/2010 09:20 AM
Actions #1

Updated by runpaint (Run Paint Run Run) almost 11 years ago

=begin
I'm working on a test suite for this. So far, I've found some scripts that don't appear to recognised (may also exist in trunk; I haven't checked yet):

1) (0xa840..0xa877).select{|o| o.chr('utf-8') =~ /\p{Phag}/} #=> []
2) (0x1c00..0x1c23).select{|o| o.chr('utf-8') =~ /\p{Lepcha}/} #=> [] http://en.wikipedia.org/wiki/Lepcha_script
3) (0x103a0..0x103c3).select{|o| o.chr('utf-8') =~ /\p{xpeo}/} #=> [] http://en.wikipedia.org/wiki/Old_Persian_cuneiform
4) The "Unknown" script, defined in http://unicode.org/reports/tr24/#Special_Explicit, isn't handled by enc-unicode.rb. /\p{Zzzz}/, for example, raises a SyntaxError.

=end

Actions #2

Updated by naruse (Yui NARUSE) almost 11 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r29620.
Run Paint, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Actions

Also available in: Atom PDF