Project

General

Profile

Actions

Bug #14363

closed

each_grapheme_cluster.size returns the wrong size

Added by sos4nt (Stefan Schüßler) about 6 years ago. Updated about 6 years ago.

Status:
Closed
Assignee:
-
Target version:
-
ruby -v:
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin15]
[ruby-core:84887]

Description

Ruby 2.5 adds String#each_grapheme_cluster to enumerate the string's grapheme clusters:

str = "a\u0300i\u0301"          #=> "àí"
str.each_grapheme_cluster.to_a  #=> ["à", "í"]

Unfortunately, the enumerator's size doesn't work as expected:

str.each_grapheme_cluster.size  #=> 4

The source code reveals that it invokes rb_str_each_char_size, so it is equivalent to each_char.size:

static VALUE
rb_str_each_grapheme_cluster(VALUE str)
{
    RETURN_SIZED_ENUMERATOR(str, 0, 0, rb_str_each_char_size);
    return rb_str_enumerate_grapheme_clusters(str, 0);
}

If the grapheme enumerator's size cannot be calculated lazily, each_grapheme_cluster.size should return nil to indicate that.


Files

each_grapheme_cluster_size_nil.patch (921 Bytes) each_grapheme_cluster_size_nil.patch hugopeixoto (Hugo Peixoto), 03/21/2018 04:17 PM
each_grapheme_cluster_size_real.patch (3.03 KB) each_grapheme_cluster_size_real.patch hugopeixoto (Hugo Peixoto), 03/21/2018 04:17 PM

Updated by hugopeixoto (Hugo Peixoto) about 6 years ago

Calculating the enumerator size here requires iterating through the whole text and do grapheme detection on all bytes, so I'm not sure what's the right approach.

I'm attaching two patches, one that makes it return nil and one that does the actual count. Both patches have tests attached.

Actions #2

Updated by naruse (Yui NARUSE) about 6 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r62892.


fix each_grapheme_cluster's size [Bug #14363]

From: Hugo Peixoto

Actions #3

Updated by naruse (Yui NARUSE) about 6 years ago

  • Backport changed from 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED

Updated by naruse (Yui NARUSE) about 6 years ago

  • Backport changed from 2.3: DONTNEED, 2.4: DONTNEED, 2.5: REQUIRED to 2.3: DONTNEED, 2.4: DONTNEED, 2.5: DONE

ruby_2_5 r62896 merged revision(s) 62892,62893.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0