Project

General

Profile

Bug #11101

Forking is killing my memory when running GC

Added by tkalmus (Thomas Kalmus) almost 4 years ago. Updated almost 2 years ago.

Status:
Feedback
Priority:
Normal
Target version:
-
[ruby-core:68999]

Description

Memory is duplicated when forking and calling GC in 2.2.1. The main issue here is that we are operating with huge data, going up to 3GB, and one fork is enough to kill our machine.

We have written a small program that reproduces the issue (see attached file).

The program instantiates an object and then forks into two processes.
The GC is called in the child process.
The memory allocation (as given by /proc/pid/smaps) changes from shared to private, thereby indicating a doubling of memory consumption.

Here is the output of the program (size is in mb):

ruby version 2.1.3
time pid message shared private
4.011s 4723 Parent pre GC 68 0
4.013s 4737 Child pre GC 68 0
8.019s 4723 Parent post GC 5 62
8.093s 4737 Child post GC 5 66

We have tested the program on Ubuntu 14.04 with ruby 1.9.3 and 2.2.1. The tests have been performed on a freshly installed Ubuntu machine.

We have also tried to fork 10 children and see a 10 doubling of the memory consumption, the issue only occurs after running the GC.


Files

mem.rb (1.25 KB) mem.rb tkalmus (Thomas Kalmus), 04/27/2015 03:13 PM

History

Updated by normalperson (Eric Wong) almost 4 years ago

Is this with or without jemalloc?
In either case, try disabling transparent hugepages in Linux, THP causes
some problems with Postgres and Redis forking, too (it is far easier to
dirty and trigger CoW on 2M hugepages than regular 4K pages).

Updated by tkalmus (Thomas Kalmus) almost 4 years ago

This is without jemalloc. I made the changes just like requested, but the issue still persists. I get what I want, when I turn the GC off (GC.disabled), but this isn't an option since, anyway after a time my memory is filled with garbage.

Updated by tkalmus (Thomas Kalmus) almost 4 years ago

I have looked at the thing closely and it seems that the GC is not CoW friendly for sparse arrays. If I create an array of undefined size and populate it with random numbers in string format, the GC will launch the CoW in the fork. However if the array has fixed size, or if I make a huge string of, for example, 30MB instead then I don't see the issue. Unfortunately, I am using a lot of dynamic arrays. Is there a way to bypass this ?

#4

Updated by hsbt (Hiroshi SHIBATA) almost 4 years ago

  • Assignee changed from tkalmus (Thomas Kalmus) to ko1 (Koichi Sasada)
  • Priority changed from 7 to Normal
#5

Updated by ko1 (Koichi Sasada) over 3 years ago

Sorry for long absent.

Could you try https://rubygems.org/gems/nakayoshi_fork ?

With this gem, your script shows on my environment:

ruby version 2.3.0
   time   pid message             shared    private
 4.021s 25883 Parent pre GC           69          0
 4.021s 25885 Child  pre GC           69          0
 8.015s 25885 Child  post GC          63          6
 8.052s 25883 Parent post GC          63          6
#6

Updated by ko1 (Koichi Sasada) over 3 years ago

Results on versions:

Without nakayoshi-fork:

ruby version 1.9.3
   time   pid message             shared    private
 4.034s 25939 Parent pre GC           85          0
 4.034s 25941 Child  pre GC           85          0
 8.052s 25941 Child  post GC           5         84
 8.068s 25939 Parent post GC           5         80
terminate child

ruby version 2.0.0
   time   pid message             shared    private
 4.030s 25944 Parent pre GC           67          0
 4.030s 25947 Child  pre GC           67          0
 8.045s 25947 Child  post GC           5         65
 8.066s 25944 Parent post GC           5         62
terminate child

ruby version 2.1.5
   time   pid message             shared    private
 4.032s 25950 Parent pre GC           68          0
 4.032s 25952 Child  pre GC           68          0
 8.048s 25952 Child  post GC           5         66
 8.064s 25950 Parent post GC           5         63
terminate child

ruby version 2.2.3
   time   pid message             shared    private
 4.045s 25957 Child  pre GC           71          0
 4.045s 25955 Parent pre GC           71          0
 8.072s 25957 Child  post GC           5         67
 8.081s 25955 Parent post GC           4         66
terminate child

ruby version 2.3.0
   time   pid message             shared    private
 4.032s 25964 Child  pre GC           70          0
 4.032s 25962 Parent pre GC           70          0
 8.053s 25964 Child  post GC           5         66
 8.064s 25962 Parent post GC           5         65
terminate child

With nakayoshi-fork:

ruby version 2.0.0
   time   pid message             shared    private
 4.032s 25974 Parent pre GC           67          0
 4.032s 25976 Child  pre GC           67          0
 8.065s 25974 Parent post GC           5         62
 8.065s 25976 Child  post GC           5         66
terminate child

ruby version 2.1.5
   time   pid message             shared    private
 4.071s 25982 Child  pre GC           70          1
 4.071s 25980 Parent pre GC           70          1
 8.062s 25982 Child  post GC          64          7
 8.101s 25980 Parent post GC          64          7
terminate child

ruby version 2.2.3
   time   pid message             shared    private
 4.034s 25985 Parent pre GC           70          2
 4.036s 25987 Child  pre GC           70          2
 8.034s 25987 Child  post GC          63          8
 8.073s 25985 Parent post GC          63          8
terminate child

ruby version 2.3.0
   time   pid message             shared    private
 4.019s 26035 Child  pre GC           70          1
 4.020s 26032 Parent pre GC           70          1
 8.019s 26035 Child  post GC          63          8
 8.053s 26032 Parent post GC          63          8
terminate child

I'm not sure why Ruby 2.0.0 is not CoW frinedly.

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Status changed from Open to Feedback

Updated by olivierlacan (Olivier Lacan) almost 3 years ago

Has anyone followed-up on this issue ever? I ran into this yesterday after deploying 2.2 to an app server and noticing twice slower response times than on 2.1.

Updated by normalperson (Eric Wong) almost 3 years ago

hi@olivierlacan.com wrote:

Has anyone followed-up on this issue ever? I ran into this yesterday
after deploying 2.2 to an app server and noticing twice slower
response times than on 2.1.

Did you try the nakayoshi_fork gem as ko1 suggested?

Performance is unlikely to be twice as slow unless you're using enough
memory to hit swap... How big are your processes on 2.1 and 2.2?
Is this on 64-bit?
What else can you tell us about your setup?
(malloc used, GC tunings, etc).

Thanks.

Updated by olivierlacan (Olivier Lacan) almost 2 years ago

normalperson (Eric Wong) wrote:

Did you try the nakayoshi_fork gem as ko1 suggested?

Sorry for replying so late. I think I missed the notification when you originally replied.
I believe nakayoshi_fork did help a bit to bring response times near to their pre 2.2
averages.

Performance is unlikely to be twice as slow unless you're using enough
memory to hit swap...

To be fair, it wasn't. I think we had a 30 ms average that jumped to 60 at first and
leveled of at 50 ms with nakayoshi_fork. I don't believe we were close to hitting swap, and
I think I was seeing this on multiple different servers at the time.

How big are your processes on 2.1 and 2.2?

I'm not sure what you're asking there.

Is this on 64-bit?
Yes.

What else can you tell us about your setup?
(malloc used, GC tunings, etc).

We're not using any GC tunings and running with Passenger Enterprise on Debian 8. No jemalloc if that's what you're wondering.

passenger_max_pool_size: 16
max_instances: 12

Interestingly, upgrading to 2.3.4 (still with nakayoshi_fork) brought us back to pre-2.2 response time averages of about 35 ms, and the averages are much more stable as well.

Updated by normalperson (Eric Wong) almost 2 years ago

hi@olivierlacan.com wrote:

normalperson (Eric Wong) wrote:

Did you try the nakayoshi_fork gem as ko1 suggested?

Sorry for replying so late. I think I missed the notification when you originally replied.
I believe nakayoshi_fork did help a bit to bring response times near to their pre 2.2
averages.

No worries about being late, good to know nakayoshi_fork helped.

Performance is unlikely to be twice as slow unless you're using enough
memory to hit swap...

To be fair, it wasn't. I think we had a 30 ms average that
jumped to 60 at first and leveled of at 50 ms with
nakayoshi_fork. I don't believe we were close to hitting swap,
and I think I was seeing this on multiple different servers at
the time.

You don't sound too confident about whether or not you're
swapping :) Actual memory in swap is harmless if it's static,
what hurts is swapping in and out as that requires I/O.

If you're on Linux, try the "vmstat 1" command and watch
"si" and "so" columns for swap-in/swap-out.

vmstat should be commonly installed via "procps" on Debian-based
systems, at least.

How big are your processes on 2.1 and 2.2?

I'm not sure what you're asking there.

I mean how much memory are your Ruby processes using? (Again,
Linux-specific), that would be the RES column of "top", and you
can get more-fine grained output via "pmap -x #{pid}" for any
particular process. The "[anon]" memory is what's actually
allocated by Ruby.

Interestingly, upgrading to 2.3.4 brought us back to pre-2.2
response time averages of about 35 ms, and the averages are
much more stable as well.

Good to know...

However, I'm wondering if that was the result of general memory
reductions in 2.3.x and if your results are consistent to Ruby
2.4.x or to other applications.

I say that because (AFAIK) there were no significant changes to
the GC for 2.3 (but some bugfixes and minor improvements)

Also available in: Atom PDF