Bug #20804
closedStop reserving stack ahead-of-time in on Linux
Description
In Linux, the main thread generally only gets a small stack mapped in initially. As the application attempts to use more stack memory, the kernel will map in more stack pages.
In https://github.com/ruby/ruby/pull/822, we added some logic to force the kernel to eagerly map and fault in the entire stack by writing a fake array near the bottom. This was done in order to fix some cases where heap memory was unexpectedly being allocated in locations close to the stack, which then prevented the stack from growing.
I ran into this because this logic needed to be fixed for ASAN (https://github.com/ruby/ruby/pull/11921). However, I actually think we should delete reserve_stack
entirely, which is the point of this issue.
Myself and @rianmcguire (Rian McGuire) had a look at this today and we believe that the original problem was in fact a symptom of a kernel bug. The kernel bug (or at least, what we think was the relevant bug) was fixed in 2017 (https://github.com/torvalds/linux/commit/c204d21f2232d875e36b8774c36ffd027dc1d606) On my machine today, under ruby 3.3.2 (2024-05-30 revision e5a195edf6) and kernel 6.10.12-200.fc40.x86_64 I can no longer reproduce the problem demonstrated by the repro script (https://gist.github.com/csfrancis/46e360d401609275246c).
kjtsanaktsidis@kjtsanaktsidis-laptop ~ % ./repro.rb
new minimum diff: 140730206072832 (2)
new minimum diff: 140725853872128 (4)
new minimum diff: 140723732975616 (10)
new minimum diff: 140719631585280 (14)
new minimum diff: 140719552581632 (69)
new minimum diff: 140719410409472 (159)
new minimum diff: 140719327940608 (1191)
new minimum diff: 140719326601216 (3111)
new minimum diff: 140719312199680 (6098)
Performing this kind of stack reservation actually causes other problems - if RLIMIT_STACK is set to a high value, performing the eager mapping like this can fail for lack of real memory
kjtsanaktsidis@kjtsanaktsidis-laptop ~ % ulimit -s 1000000000
kjtsanaktsidis@kjtsanaktsidis-laptop ~ % ruby -e "puts 'hi'"
zsh: segmentation fault (core dumped) ruby -e "puts 'hi'"
So, therefore, I believe the right thing to do is to just delete reserve_stack
. Are there any objections to doing this?
Updated by mame (Yusuke Endoh) 2 months ago
In short, you are saying that the issue is supposed to have been fixed since Linux 4.13, right? Then I don't see a problem.
Updated by kjtsanaktsidis (KJ Tsanaktsidis) 2 months ago
Yes, that’s exactly right. I will go ahead and delete this code then. Thank you!
Updated by kjtsanaktsidis (KJ Tsanaktsidis) about 2 months ago
- Status changed from Open to Closed
Applied in changeset git|dcf3add96bd6e117435c568e78be59bb7ecad701.
Delete reserve_stack code
This code was working around a bug in the Linux kernel. It was
previously possible for the kernel to place heap pages in a region where
the stack was allowed to grow into, and then therefore run out of usable
stack memory before RLIMIT_STACK was reached.
This bug was fixed in Linux commit
https://github.com/torvalds/linux/commit/c204d21f2232d875e36b8774c36ffd027dc1d606
for kernel 4.13 in 2017. Therefore, in 2024, we should be safe to delete
this workaround.
[Bug #20804]