Project

General

Profile

Actions

Feature #1889

closed

Teach Onigurma Unicode 5.0 Character Properties

Added by runpaint (Run Paint Run Run) over 14 years ago. Updated almost 13 years ago.

Status:
Closed
Target version:
[ruby-core:24775]

Description

=begin
Onigurma understands named category properties such that

0x012c.chr('utf-8')
=> "Ĭ"
0x012c.chr('utf-8') =~ /\p{Lu}/
=> 0

By my reckoning there are about 3,000 characters in the current UnicodeData.txt that it doesn't have property mappings for. For example: U+AA59 (CHAM DIGIT NINE) is in the Nd category (http://unicode.org/cldr/utility/character.jsp?a=AA59) yet:

puts 0xaa59.chr('utf-8')

=> nil
0xaa59.chr('utf-8') =~ /\p{Nd}/
=> nil

I've attached two patches for the two categories I've updated in the hope that somebody familiar with the code can either tell me I'm on the right track, or explain a better approach. :-) If they look OK I'll try adding the remainder.

(The diffs are a bit noisy because I tried to retain the original ordering and layout of the code).
=end


Files

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0