Project

General

Profile

Actions

Bug #3994

closed

Oniguruma False Negatives for Certain Unicode Scripts

Added by runpaint (Run Paint Run Run) over 13 years ago. Updated almost 13 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 1.9.3dev (2010-10-28 trunk 29616) [x86_64-linux]
Backport:
[ruby-core:32931]

Description

=begin
As mentioned in #3989, the following scripts aren't recognised. All of the codepoints below should match the respective script, but they don't on trunk.

http://en.wikipedia.org/wiki/Lepcha_script

 lepc #=> [7168, 7169, 7170, 7171, 7172, 7173, 7174, 7175, 7176, 7177, 7178, 7179, 7180, 7181, 7182, 7183, 7184, 7185, 7186, 7187, 7188, 
           7189, 7190, 7191, 7192, 7193, 7194, 7195, 7196, 7197, 7198, 7199, 7200, 7201, 7202, 7203, 7204, 7205, 7206, 7207, 7208, 7209, 
           7210, 7211, 7212, 7213, 7214, 7215, 7216, 7217, 7218, 7219, 7220, 7221, 7222, 7223, 7227, 7228, 7229, 7230, 7231, 7232, 7233, 
           7234, 7235, 7236, 7237, 7238, 7239, 7240, 7241, 7245, 7246, 7247]
 lepc.select{|o| o.chr('utf-8') =~ /\p{lepc}/} #=> []

http://en.wikipedia.org/wiki/'Phags-pa_script

 phag #=> [43072, 43073, 43074, 43075, 43076, 43077, 43078, 43079, 43080, 43081, 43082, 43083, 43084, 43085, 43086, 43087, 43088, 43089, 
           43090, 43091, 43092, 43093, 43094, 43095, 43096, 43097, 43098, 43099, 43100, 43101, 43102, 43103, 43104, 43105, 43106, 43107, 
           43108, 43109, 43110, 43111, 43112, 43113, 43114, 43115, 43116, 43117, 43118, 43119, 43120, 43121, 43122, 43123, 43124, 43125, 
           43126, 43127]
 phag.select{|o| o.chr('utf-8') =~ /\p{phag}/} #=> []

http://en.wikipedia.org/wiki/Old_Persian_cuneiform

 xpeo #=> [66464, 66465, 66466, 66467, 66468, 66469, 66470, 66471, 66472, 66473, 66474, 66475, 66476, 66477, 66478, 66479, 66480, 66481, 
           66482, 66483, 66484, 66485, 66486, 66487, 66488, 66489, 66490, 66491, 66492, 66493, 66494, 66495, 66496, 66497, 66498, 66499, 
           66504, 66505, 66506, 66507, 66508, 66509, 66510, 66511, 66512, 66513, 66514, 66515, 66516, 66517]
 xpeo.select{|o| o.chr('utf-8') =~ /\p{xpeo}/} #=> []

=end

Actions #1

Updated by naruse (Yui NARUSE) over 13 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

=begin
This issue was solved with changeset r29619.
Run Paint, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

=end

Actions #2

Updated by duerst (Martin Dürst) over 13 years ago

=begin
Hello Run Paint Run Run,

I suggest you use hex numbers for Unicode characters. It's a lot easier
to check them against other documents,... Nobody uses decimal these days.

Regards, Martin.

On 2010/10/28 19:03, Run Paint Run Run wrote:

Bug #3994: Oniguruma False Negatives for Certain Unicode Scripts
http://redmine.ruby-lang.org/issues/show/3994

Author: Run Paint Run Run
Status: Open, Priority: Normal
Category: M17N
ruby -v: ruby 1.9.3dev (2010-10-28 trunk 29616) [x86_64-linux]

As mentioned in #3989, the following scripts aren't recognised. All of the codepoints below should match the respective script, but they don't on trunk.

http://en.wikipedia.org/wiki/Lepcha_script

 lepc #=>  [7168, 7169, 7170, 7171, 7172, 7173, 7174, 7175, 7176, 7177, 7178, 7179, 7180, 7181, 7182, 7183, 7184, 7185, 7186, 7187, 7188,
           7189, 7190, 7191, 7192, 7193, 7194, 7195, 7196, 7197, 7198, 7199, 7200, 7201, 7202, 7203, 7204, 7205, 7206, 7207, 7208, 7209,
           7210, 7211, 7212, 7213, 7214, 7215, 7216, 7217, 7218, 7219, 7220, 7221, 7222, 7223, 7227, 7228, 7229, 7230, 7231, 7232, 7233,
           7234, 7235, 7236, 7237, 7238, 7239, 7240, 7241, 7245, 7246, 7247]
 lepc.select{|o| o.chr('utf-8') =~ /\p{lepc}/} #=>  []

http://en.wikipedia.org/wiki/'Phags-pa_script

 phag #=>  [43072, 43073, 43074, 43075, 43076, 43077, 43078, 43079, 43080, 43081, 43082, 43083, 43084, 43085, 43086, 43087, 43088, 43089,
           43090, 43091, 43092, 43093, 43094, 43095, 43096, 43097, 43098, 43099, 43100, 43101, 43102, 43103, 43104, 43105, 43106, 43107,
           43108, 43109, 43110, 43111, 43112, 43113, 43114, 43115, 43116, 43117, 43118, 43119, 43120, 43121, 43122, 43123, 43124, 43125,
           43126, 43127]
 phag.select{|o| o.chr('utf-8') =~ /\p{phag}/} #=>  []

http://en.wikipedia.org/wiki/Old_Persian_cuneiform

 xpeo #=>  [66464, 66465, 66466, 66467, 66468, 66469, 66470, 66471, 66472, 66473, 66474, 66475, 66476, 66477, 66478, 66479, 66480, 66481,
           66482, 66483, 66484, 66485, 66486, 66487, 66488, 66489, 66490, 66491, 66492, 66493, 66494, 66495, 66496, 66497, 66498, 66499,
           66504, 66505, 66506, 66507, 66508, 66509, 66510, 66511, 66512, 66513, 66514, 66515, 66516, 66517]
 xpeo.select{|o| o.chr('utf-8') =~ /\p{xpeo}/} #=>  []

http://redmine.ruby-lang.org

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp
=end

Actions #3

Updated by duerst (Martin Dürst) over 13 years ago

=begin
Hello Run Paint Run Run,

I suggest you use hex numbers for Unicode characters. It's a lot easier
to check them against other documents,... Nobody uses decimal these days.

Regards, Martin.

On 2010/10/28 19:03, Run Paint Run Run wrote:

Bug #3994: Oniguruma False Negatives for Certain Unicode Scripts
http://redmine.ruby-lang.org/issues/show/3994

Author: Run Paint Run Run
Status: Open, Priority: Normal
Category: M17N
ruby -v: ruby 1.9.3dev (2010-10-28 trunk 29616) [x86_64-linux]

As mentioned in #3989, the following scripts aren't recognised. All of the codepoints below should match the respective script, but they don't on trunk.

http://en.wikipedia.org/wiki/Lepcha_script

 lepc #=>  [7168, 7169, 7170, 7171, 7172, 7173, 7174, 7175, 7176, 7177, 7178, 7179, 7180, 7181, 7182, 7183, 7184, 7185, 7186, 7187, 7188,
           7189, 7190, 7191, 7192, 7193, 7194, 7195, 7196, 7197, 7198, 7199, 7200, 7201, 7202, 7203, 7204, 7205, 7206, 7207, 7208, 7209,
           7210, 7211, 7212, 7213, 7214, 7215, 7216, 7217, 7218, 7219, 7220, 7221, 7222, 7223, 7227, 7228, 7229, 7230, 7231, 7232, 7233,
           7234, 7235, 7236, 7237, 7238, 7239, 7240, 7241, 7245, 7246, 7247]
 lepc.select{|o| o.chr('utf-8') =~ /\p{lepc}/} #=>  []

http://en.wikipedia.org/wiki/'Phags-pa_script

 phag #=>  [43072, 43073, 43074, 43075, 43076, 43077, 43078, 43079, 43080, 43081, 43082, 43083, 43084, 43085, 43086, 43087, 43088, 43089,
           43090, 43091, 43092, 43093, 43094, 43095, 43096, 43097, 43098, 43099, 43100, 43101, 43102, 43103, 43104, 43105, 43106, 43107,
           43108, 43109, 43110, 43111, 43112, 43113, 43114, 43115, 43116, 43117, 43118, 43119, 43120, 43121, 43122, 43123, 43124, 43125,
           43126, 43127]
 phag.select{|o| o.chr('utf-8') =~ /\p{phag}/} #=>  []

http://en.wikipedia.org/wiki/Old_Persian_cuneiform

 xpeo #=>  [66464, 66465, 66466, 66467, 66468, 66469, 66470, 66471, 66472, 66473, 66474, 66475, 66476, 66477, 66478, 66479, 66480, 66481,
           66482, 66483, 66484, 66485, 66486, 66487, 66488, 66489, 66490, 66491, 66492, 66493, 66494, 66495, 66496, 66497, 66498, 66499,
           66504, 66505, 66506, 66507, 66508, 66509, 66510, 66511, 66512, 66513, 66514, 66515, 66516, 66517]
 xpeo.select{|o| o.chr('utf-8') =~ /\p{xpeo}/} #=>  []

http://redmine.ruby-lang.org

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp

=end

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0