Project

General

Profile

Bug #3217

Regexp fails to match string with '<' when encoding is UTF-8

Added by brixen (Brian Shirai) over 9 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.2dev (2010-04-28 trunk 27536) [i386-darwin9.8.0]
Backport:
[ruby-core:29864]

Description

=begin
Hi,

There is an issue matching a string like "a b c d<" when the encoding of the file is set to UTF-8 and the regexp is attempting to match 'something'. Afaik, *< is not special in the encoding.

This gist illustrates the issue:

http://gist.github.com/382510

Thanks,
Brian
=end


Related issues

Has duplicate Backport191 - Bug #3386: Inconsistent regexp punct class matching behavior between UTF-8 and ASCII encodingsRejected06/04/2010Actions

History

#1

Updated by naruse (Yui NARUSE) over 9 years ago

  • Status changed from Open to Rejected

=begin
'<' is not Punctuation on Unicode; it is Math_Symbol.
http://unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt
=end

#2

Updated by naruse (Yui NARUSE) over 9 years ago

  • Status changed from Rejected to Assigned
  • Assignee set to naruse (Yui NARUSE)

=begin
Oops, I missed this. I'll fix.
=end

#3

Updated by naruse (Yui NARUSE) over 9 years ago

  • Category set to M17N
  • Status changed from Assigned to Rejected

=begin
This is feature change on Ruby 1.9.
http://www.unicode.org/reports/tr18/

And redcloth3's exapmle is a bug, they should use their PUNCT constant.
=end

Also available in: Atom PDF