Project

General

Profile

Actions

Bug #20150

closed

Memory leak in grapheme clusters

Added by peterzhu2118 (Peter Zhu) 4 months ago. Updated about 2 months ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:116016]

Description

GitHub PR: https://github.com/ruby/ruby/pull/9414

String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.

For example:

str = "hello world".encode(Encoding::UTF_32LE)

10.times do
  1_000.times do
    str.grapheme_clusters
  end

  puts `ps -o rss= -p #{$$}`
end

Before:

26000
42256
59008
75792
92528
109232
125936
142672
159392
176160

After:

9264
9504
9808
10000
10128
10224
10352
10544
10704
10896

Related issues 1 (1 open0 closed)

Related to Ruby master - Feature #19908: Update to Unicode 15.1Assignedduerst (Martin Dürst)Actions
Actions #1

Updated by duerst (Martin Dürst) 4 months ago

Updated by jeremyevans0 (Jeremy Evans) 4 months ago

  • Status changed from Open to Closed

Updated by nagachika (Tomoyuki Chikanaga) 4 months ago

  • Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED

ruby_3_2 b4f8623441a8be53b643fed826ba44e933cafd7e merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.

Actions #4

Updated by Anonymous 4 months ago

Hello everybody (but in particular Tomoyuki Chikanaga and Yui Naruse),

On 2024-01-18 12:21, nagachika (Tomoyuki Chikanaga) via ruby-core wrote:

Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga).

Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED

I was under the impression that backports of bug fixes had to "trickle
down", i.e. first being applied in the main branch, then 3.3, then 3.2,
and so on (of course unless they were not needed for a specific branch).

The above "3.2: DONE, 3.3: REQUIRED" shows that the backport first
occurred in 3.2, before 3.3.

Can somebody please confirm or restate the actual backport policy now in
effect?

Thanks and regards, Martin.

ruby_3_2 b4f8623441a8be53b643fed826ba44e933cafd7e merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.


Bug #20150: Memory leak in grapheme clusters
https://bugs.ruby-lang.org/issues/20150#change-106310

  • Author: peterzhu2118 (Peter Zhu)
  • Status: Closed
  • Priority: Normal
  • Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED

GitHub PR: https://github.com/ruby/ruby/pull/9414

String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.


Updated by nagachika (Tomoyuki Chikanaga) 4 months ago

Hello, Martin-sensei.

In my understandings, there's no explicit rule regarding the order of backporting to each stable branch.
In this case, I backported the changeset to the 3.2 branch ahead of the 3.3 branch because I hoped to include some obvious bug-fixes in ruby-3.2.3 released yesterday. I also think these fixes should be backported to 3.3 branch before release of ruby-3.3.1, but it's up to naruse-san, the current 3.3 branch maintainer.

Best Regards,

Actions #6

Updated by Anonymous 4 months ago

Hello Tomoyuki,

Many thanks for your careful explanation!

Regards, Martin.

On 2024-01-19 17:14, nagachika (Tomoyuki Chikanaga) via ruby-core wrote:

Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga).

Hello, Martin-sensei.

In my understandings, there's no explicit rule regarding the order of backporting to each stable branch.
In this case, I backported the changeset to the 3.2 branch ahead of the 3.3 branch because I hoped to include some obvious bug-fixes in ruby-3.2.3 released yesterday. I also think these fixes should be backported to 3.3 branch before release of ruby-3.3.1, but it's up to naruse-san, the current 3.3 branch maintainer.

Best Regards,


Bug #20150: Memory leak in grapheme clusters
https://bugs.ruby-lang.org/issues/20150#change-106342

  • Author: peterzhu2118 (Peter Zhu)
  • Status: Closed
  • Priority: Normal
  • Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED

GitHub PR: https://github.com/ruby/ruby/pull/9414

String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.

For example:

str = "hello world".encode(Encoding::UTF_32LE)

10.times do
   1_000.times do
     str.grapheme_clusters
   end

   puts `ps -o rss= -p #{$$}`
end

Before:

26000
42256
59008
75792
92528
109232
125936
142672
159392
176160

After:

9264
9504
9808
10000
10128
10224
10352
10544
10704
10896

Updated by naruse (Yui NARUSE) about 2 months ago

  • Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: DONE

ruby_3_3 62de3eb5a2e5b1f0f1516dc99241c4c54a1bf691 merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0