Bug #20150
closedMemory leak in grapheme clusters
Description
GitHub PR: https://github.com/ruby/ruby/pull/9414
String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.
For example:
str = "hello world".encode(Encoding::UTF_32LE)
10.times do
1_000.times do
str.grapheme_clusters
end
puts `ps -o rss= -p #{$$}`
end
Before:
26000
42256
59008
75792
92528
109232
125936
142672
159392
176160
After:
9264
9504
9808
10000
10128
10224
10352
10544
10704
10896
Updated by duerst (Martin Dürst) 10 months ago
- Related to Feature #19908: Update to Unicode 15.1 added
Updated by jeremyevans0 (Jeremy Evans) 10 months ago
- Status changed from Open to Closed
Updated by nagachika (Tomoyuki Chikanaga) 10 months ago
- Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED
ruby_3_2 b4f8623441a8be53b643fed826ba44e933cafd7e merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.
Updated by Anonymous 10 months ago
Hello everybody (but in particular Tomoyuki Chikanaga and Yui Naruse),
On 2024-01-18 12:21, nagachika (Tomoyuki Chikanaga) via ruby-core wrote:
Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga).
Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: REQUIRED, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED
I was under the impression that backports of bug fixes had to "trickle
down", i.e. first being applied in the main branch, then 3.3, then 3.2,
and so on (of course unless they were not needed for a specific branch).
The above "3.2: DONE, 3.3: REQUIRED" shows that the backport first
occurred in 3.2, before 3.3.
Can somebody please confirm or restate the actual backport policy now in
effect?
Thanks and regards, Martin.
ruby_3_2 b4f8623441a8be53b643fed826ba44e933cafd7e merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.
Bug #20150: Memory leak in grapheme clusters
https://bugs.ruby-lang.org/issues/20150#change-106310
- Author: peterzhu2118 (Peter Zhu)
- Status: Closed
- Priority: Normal
- Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED
GitHub PR: https://github.com/ruby/ruby/pull/9414
String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.
Updated by nagachika (Tomoyuki Chikanaga) 10 months ago
Hello, Martin-sensei.
In my understandings, there's no explicit rule regarding the order of backporting to each stable branch.
In this case, I backported the changeset to the 3.2 branch ahead of the 3.3 branch because I hoped to include some obvious bug-fixes in ruby-3.2.3 released yesterday. I also think these fixes should be backported to 3.3 branch before release of ruby-3.3.1, but it's up to naruse-san, the current 3.3 branch maintainer.
Best Regards,
Updated by Anonymous 10 months ago
Hello Tomoyuki,
Many thanks for your careful explanation!
Regards, Martin.
On 2024-01-19 17:14, nagachika (Tomoyuki Chikanaga) via ruby-core wrote:
Issue #20150 has been updated by nagachika (Tomoyuki Chikanaga).
Hello, Martin-sensei.
In my understandings, there's no explicit rule regarding the order of backporting to each stable branch.
In this case, I backported the changeset to the 3.2 branch ahead of the 3.3 branch because I hoped to include some obvious bug-fixes in ruby-3.2.3 released yesterday. I also think these fixes should be backported to 3.3 branch before release of ruby-3.3.1, but it's up to naruse-san, the current 3.3 branch maintainer.Best Regards,
Bug #20150: Memory leak in grapheme clusters
https://bugs.ruby-lang.org/issues/20150#change-106342
- Author: peterzhu2118 (Peter Zhu)
- Status: Closed
- Priority: Normal
- Backport: 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED
GitHub PR: https://github.com/ruby/ruby/pull/9414
String#grapheme_cluters and String#each_grapheme_cluster leaks memory because if the string is not UTF-8, then the created regex will not be freed.
For example:
str = "hello world".encode(Encoding::UTF_32LE) 10.times do 1_000.times do str.grapheme_clusters end puts `ps -o rss= -p #{$$}` end
Before:
26000 42256 59008 75792 92528 109232 125936 142672 159392 176160
After:
9264 9504 9808 10000 10128 10224 10352 10544 10704 10896
Updated by naruse (Yui NARUSE) 8 months ago
- Backport changed from 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: REQUIRED to 3.0: UNKNOWN, 3.1: REQUIRED, 3.2: DONE, 3.3: DONE
ruby_3_3 62de3eb5a2e5b1f0f1516dc99241c4c54a1bf691 merged revision(s) b3d612804946e841e47d14e09b6839224a79c1a4.