Project

General

Profile

Feature #15667

Introduce malloc_trim(0) in full gc cycles

Added by sam.saffron (Sam Saffron) 9 months ago. Updated 3 months ago.

Status:
Open
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:92087]

Description

Per Hongli's excellent article it looks like malloc_trim can help tremendously with memory bloat issues.

https://www.joyfulbikeshedding.com/blog/2019-03-14-what-causes-ruby-memory-bloat.html#a-magic-trick-trimming

I would like to get this patch tested side-by-side at Discourse, GitHub and Shopify. If it looks good I think this is both a great candidate for 2.7 and and 2.4,2.5,2.6 backports.

Will coordinate with Shopify and GitHub to see if we can get numbers posted here, I will run tests on a live Discourse instance over the next week and report numbers here.

Koichi, what are your thoughts, to me this looks like an incredibly safe patch, the amount of work added to major GCs is tiny compared to the potential benefit, walking all pages is a very cheap operation.


Files

ruby_gc_malloc_trim.patch (1011 Bytes) ruby_gc_malloc_trim.patch mame (Yusuke Endoh), 03/20/2019 01:39 AM
Screenshot_2019-03-28 Grafana - Compare Discourse Perf.png (530 KB) Screenshot_2019-03-28 Grafana - Compare Discourse Perf.png sam.saffron (Sam Saffron), 03/28/2019 01:43 AM
crash.png (85.9 KB) crash.png sam.saffron (Sam Saffron), 04/01/2019 04:05 AM

Related issues

Related to Ruby master - Feature #14759: [PATCH] set M_ARENA_MAX for glibc mallocOpenActions

History

#2

Updated by duerst (Martin Dürst) 9 months ago

  • Related to Feature #14759: [PATCH] set M_ARENA_MAX for glibc malloc added
#3

Updated by carlos@redhat.com (Carlos O'Donell) 9 months ago

As a maintainer of the quoted glibc code I'd be really interested in the results of this work. Please share when ready.

Updated by mame (Yusuke Endoh) 9 months ago

I created a patch.

I would like to get this patch tested side-by-side at Discourse, GitHub and Shopify.

Could you test the patch attached?

Updated by sam.saffron (Sam Saffron) 9 months ago

mame (Yusuke Endoh) / carlos (Carlos Sánchez), absolutely, I just need a few more days here, mounting this kind of test is not trivial even with docker containers.

Updated by sam.saffron (Sam Saffron) 9 months ago

mame (Yusuke Endoh) / carlos (Carlos Sánchez) attached is a screenshot of side by side testing on live traffic patterns

containers run multiple ruby processes (a few unicorn workers and a sidekiq worker)

standard14 = Ruby 2.6.2 + jemalloc
standard14_a = Ruby 2.6.2 + glibc malloc
standard14_b = Ruby 2.6.2 + glibc malloc + patch

My conclusions from these graphs:

  • Memory is clearly down with the patch
  • 99th percentile performance is slightly impacted
  • cpu is very slightly higher
  • jemalloc still fairs better than glibc even after the patch

I think I would support a slightly amended patch that only does the trim once every say 10 minutes (maybe even in a background thread), happy to test that out as well.

That said... selfishly for Discourse this does not matter that much we will still stick with jemalloc cause memory is better and performance is better under jemalloc.

For the wider ruby community though a safer default is very appealing.

Updated by bluz71 (Dennis B) 9 months ago

Thanks Sam, a very nice set of results.

Notice that 99th percentile Topic list was faster with the patch, whilst slower with Topic view. So I'm not sure we can say that the patch will always be slower on the worst runs.

Query, what is the version of jemalloc that you are using? One of the interesting observations in #14759 is the variance between jemalloc versions (say 3.6.0 vs 5.1.0).

Updated by tessi (Philipp Tessenow) 9 months ago

FYI: For easier testing this idea, I just pushed a small gem to rubygems malloc_trim (https://github.com/tessi/malloc_trim). This gives access to malloc_trim to ruby land to let us play with it without the need to re-compile ruby and/or deploying a custom patched ruby.

With MallocTrim.enable_trimming, there is also a built in way to run malloc_trim(0) after every GC MARK (the most relevant internal event I found to hook into). It probably calls malloc_trim somewhat too often -- I'd be happy for suggestions to find a better hook.

In any way, doing a manual MallocTrim.trim after a GC.start (e.g. between two rails requests) is still possible with this gem.

Updated by sam.saffron (Sam Saffron) 9 months ago

tessi (Philipp Tessenow)

My tests were with 3.6.0, I will do a side by side now that I have all the infrastructure of 5 vs 3.6 and even tcmalloc.

Nice to see that gem! Looking at my graphs I think best bang would just be to spin a thread that does trimming every 10-30 minutes or something, especially if you can release the GVL prior to calling it (provided this thing is thread safe, which carlos (Carlos Sánchez) should know)

Updated by dosadnizub (Borna Novak) 3 months ago

I'd just like to point out that calling malloc_trim(0) according to article explaining the feature - https://www.joyfulbikeshedding.com/blog/2019-03-14-what-causes-ruby-memory-bloat.html - will not fix memory fragmentation - it'll just make the "used memory" number go down - the graph on https://www.joyfulbikeshedding.com/images/2019/os_heap_visualization_after_trim-b3e36496.png still means that an attempt to grab 15 pages of memory (~16kb) - by ruby or any other process - will fail even though there's 8MB of memory "free" causing out-of-memory errors on systems that are not even in swap.

There's good reason why modern software typically doesn't release memory back to the system but rather reuses the pages it has allocated previously - or am I missing something important?

Also available in: Atom PDF