https://redmine.ruby-lang.org/https://redmine.ruby-lang.org/favicon.ico?17113305112021-04-12T13:24:34ZRuby Issue Tracking SystemRuby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=914962021-04-12T13:24:34ZDan0042 (Daniel DeLorme)
<ul></ul><blockquote>
<p>Most database clients, ORMs or other libraries keeping a connection pool might need to close connections before the fork happens.</p>
</blockquote>
<p>Am I missing something? Afaik the proper way to do this is to close the connection <em>after</em> the fork. Of course if you do it before the fork it will work, but then the parent must reconnect, not to mention it aborts any transaction in progress. Maybe there's some confusion because a "full" disconnect in this child will send a disconnection message to the DB server, which will then close its connection and cause a lost connection in the parent process. But as long as you just disconnect the DB <em>socket</em> in the child without doing the full disconnection thing, the DB connection will keep working just fine in the parent.</p>
<blockquote>
<p>Unfortunately <code>Process.pid</code> is relatively slow on Linux</p>
</blockquote>
<p>I did a benchmark on my Linux machine (Linux ubuntu 4.4.0-206-generic) and <code>$$</code> is the same speed as <code>1==1</code>. <code>Process.pid</code> is only marginally slower. I'd like to know where the idea that it's slow is coming from.</p>
<blockquote>
<p>Make Kernel.fork a delegator</p>
</blockquote>
<p>This works great for before_fork, but after_fork needs to handle both the with-block and without-block forms. So everyone who justs want an after_fork hook will need to write some boilerplate code like below. So it seems to me this is not such a nice API.</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="k">def</span> <span class="nf">fork</span><span class="p">(</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="o">**</span><span class="n">kw</span><span class="p">,</span> <span class="o">&</span><span class="n">b</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">block_given?</span>
<span class="k">super</span><span class="p">(</span><span class="o">*</span><span class="n">a</span><span class="p">,</span> <span class="o">**</span><span class="n">kw</span><span class="p">)</span> <span class="k">do</span>
<span class="n">after_fork</span>
<span class="k">yield</span>
<span class="k">end</span>
<span class="k">else</span>
<span class="n">pid</span><span class="o">=</span><span class="k">super</span>
<span class="n">after_fork</span> <span class="k">if</span> <span class="o">!</span><span class="n">pid</span>
<span class="n">pid</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=914972021-04-12T14:12:20Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>Afaik the proper way to do this is to close the connection after the fork.</p>
</blockquote>
<p>No before. Otherwise the connection is "shared" and closing it in the children cause issues for the connections in the parent.</p>
<blockquote>
<p>I'd like to know where the idea that it's slow is coming from.</p>
</blockquote>
<p>Maybe your glibc is quite old? <a href="https://sourceware.org/glibc/wiki/Release/2.25#pid_cache_removal" class="external">https://sourceware.org/glibc/wiki/Release/2.25#pid_cache_removal</a></p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="nb">require</span> <span class="s1">'benchmark/ips'</span>
<span class="k">module</span> <span class="nn">Foo</span>
<span class="k">class</span> <span class="o"><<</span> <span class="nb">self</span>
<span class="nb">attr_accessor</span> <span class="ss">:bar</span>
<span class="k">end</span>
<span class="vi">@bar</span> <span class="o">=</span> <span class="mi">42</span>
<span class="k">end</span>
<span class="nb">puts</span> <span class="s2">"</span><span class="si">#{</span><span class="no">RUBY_VERSION</span><span class="si">}</span><span class="s2"> </span><span class="si">#{</span><span class="no">RUBY_PLATFORM</span><span class="si">}</span><span class="s2">"</span>
<span class="no">Benchmark</span><span class="p">.</span><span class="nf">ips</span> <span class="k">do</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s1">'Process.pid'</span><span class="p">)</span> <span class="p">{</span> <span class="no">Process</span><span class="p">.</span><span class="nf">pid</span> <span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="nf">report</span><span class="p">(</span><span class="s1">'Module.attr'</span><span class="p">)</span> <span class="p">{</span> <span class="no">Foo</span><span class="p">.</span><span class="nf">bar</span> <span class="p">}</span>
<span class="n">x</span><span class="p">.</span><span class="nf">compare!</span>
<span class="k">end</span>
</code></pre>
<pre><code>3.0.1 x86_64-darwin20
Warming up --------------------------------------
Process.pid 1.914M i/100ms
Module.attr 1.775M i/100ms
Calculating -------------------------------------
Process.pid 19.144M (± 0.7%) i/s - 97.626M in 5.099666s
Module.attr 17.820M (± 0.4%) i/s - 90.530M in 5.080332s
Comparison:
Process.pid: 19144498.7 i/s
Module.attr: 17820085.3 i/s - 1.07x (± 0.00) slower
</code></pre>
<pre><code>3.0.1 x86_64-linux
Warming up --------------------------------------
Process.pid 698.792k i/100ms
Module.attr 1.886M i/100ms
Calculating -------------------------------------
Process.pid 6.862M (± 1.5%) i/s - 34.940M in 5.092832s
Module.attr 19.184M (± 1.0%) i/s - 96.197M in 5.014902s
Comparison:
Module.attr: 19183904.4 i/s
Process.pid: 6862219.4 i/s - 2.80x (± 0.00) slower
</code></pre>
<p>So fast enough for things that are infrequently called, but slow enough that I see it sitting at <code>1-2%</code> of CPU profiles in real production workloads.</p>
<blockquote>
<p>So it seems to me this is not such a nice API.</p>
</blockquote>
<p>It's not really intended as an actual API, but as a smaller change that would be more easily accepted by the core team.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=914982021-04-12T14:59:56ZDan0042 (Daniel DeLorme)
<ul></ul><p>byroot (Jean Boussier) wrote in <a href="#note-2">#note-2</a>:</p>
<blockquote>
<blockquote>
<p>Afaik the proper way to do this is to close the connection after the fork.</p>
</blockquote>
<p>No before. Otherwise the connection is "shared" and closing it in the children cause issues for the connections in the parent.</p>
</blockquote>
<p>It seems quite easy to demonstrate otherwise:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="c1"># with Sequel and mysql2</span>
<span class="no">DB</span><span class="p">[</span><span class="s2">"select rand()"</span><span class="p">].</span><span class="nf">first</span> <span class="c1">#=> {:"rand()"=>0.1878050166999968}</span>
<span class="n">m</span> <span class="o">=</span> <span class="no">DB</span><span class="p">.</span><span class="nf">pool</span><span class="p">.</span><span class="nf">available_connections</span><span class="p">.</span><span class="nf">first</span>
<span class="no">Process</span><span class="p">.</span><span class="nf">wait</span> <span class="nb">fork</span><span class="p">{</span>
<span class="no">IO</span><span class="p">.</span><span class="nf">for_fd</span><span class="p">(</span><span class="n">m</span><span class="p">.</span><span class="nf">socket</span><span class="p">).</span><span class="nf">close</span> <span class="c1">#without this the query below would fail</span>
<span class="p">}</span>
<span class="no">DB</span><span class="p">[</span><span class="s2">"select rand()"</span><span class="p">].</span><span class="nf">first</span> <span class="c1">#=> {:"rand()"=>0.3590592307588637}</span>
</code></pre>
<blockquote>
<p>Maybe your glibc is quite old? <a href="https://sourceware.org/glibc/wiki/Release/2.25#pid_cache_removal" class="external">https://sourceware.org/glibc/wiki/Release/2.25#pid_cache_removal</a></p>
</blockquote>
<p>That must be it. Still, I'm relieved we're only talking about 3x slower than the simplest/fastest ruby operations.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915052021-04-13T01:32:02Zmrkn (Kenta Murata)muraken@gmail.com
<ul></ul><p>Python provides <code>os.register_at_fork</code> API for registering callbacks which are called on both before and after fork.<br>
Especially after-fork, it enables to register callbacks for parent and child processes, separately.<br>
See <a href="https://docs.python.org/3/library/os.html#os.register_at_fork" class="external">https://docs.python.org/3/library/os.html#os.register_at_fork</a></p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915062021-04-13T02:32:06Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>The main issue I see with this is the potential for misuse. If I could be sure this would only be used by applications and not libraries, it seems reasonable. However, I suspect the main usage will be libraries, and it's not really possible to use this correctly in libraries. For example, this currently works:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="no">Sequel</span><span class="p">.</span><span class="nf">postgres</span><span class="p">.</span><span class="nf">transaction</span> <span class="k">do</span>
<span class="nb">exit!</span> <span class="k">if</span> <span class="nb">fork</span>
<span class="k">end</span>
</code></pre>
<p>So does this:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="no">Sequel</span><span class="p">.</span><span class="nf">postgres</span><span class="p">.</span><span class="nf">transaction</span> <span class="k">do</span>
<span class="nb">exit!</span> <span class="k">unless</span> <span class="nb">fork</span>
<span class="k">end</span>
</code></pre>
<p>If a library starts using a <code>before_fork</code> or <code>after_fork</code> hook to close connections automatically, it breaks at least one of the two cases. Automatically disconnecting in <code>before_fork</code> will break both cases, since the connection is disconnected before the fork. Automatically disconnecting in <code>after_fork</code> breaks at least the second case, and can break the first as well if it does a full cleanup instead of just closing the file descriptor of the connection socket.</p>
<p>While most cases where things break are not as clear cut as the above examples, they can and do exist.</p>
<p>Dan0042 (Daniel DeLorme) wrote in <a href="#note-3">#note-3</a>:</p>
<blockquote>
<p>byroot (Jean Boussier) wrote in <a href="#note-2">#note-2</a>:</p>
<blockquote>
<blockquote>
<p>Afaik the proper way to do this is to close the connection after the fork.</p>
</blockquote>
<p>No before. Otherwise the connection is "shared" and closing it in the children cause issues for the connections in the parent.</p>
</blockquote>
<p>It seems quite easy to demonstrate otherwise:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="c1"># with Sequel and mysql2</span>
<span class="no">DB</span><span class="p">[</span><span class="s2">"select rand()"</span><span class="p">].</span><span class="nf">first</span> <span class="c1">#=> {:"rand()"=>0.1878050166999968}</span>
<span class="n">m</span> <span class="o">=</span> <span class="no">DB</span><span class="p">.</span><span class="nf">pool</span><span class="p">.</span><span class="nf">available_connections</span><span class="p">.</span><span class="nf">first</span>
<span class="no">Process</span><span class="p">.</span><span class="nf">wait</span> <span class="nb">fork</span><span class="p">{</span>
<span class="no">IO</span><span class="p">.</span><span class="nf">for_fd</span><span class="p">(</span><span class="n">m</span><span class="p">.</span><span class="nf">socket</span><span class="p">).</span><span class="nf">close</span> <span class="c1">#without this the query below would fail</span>
<span class="p">}</span>
<span class="no">DB</span><span class="p">[</span><span class="s2">"select rand()"</span><span class="p">].</span><span class="nf">first</span> <span class="c1">#=> {:"rand()"=>0.3590592307588637}</span>
</code></pre>
</blockquote>
<p>This is not a good example because it's not portable across adapters. You are relying on the database driver exposing the file descriptor (not all drivers do). Sequel doesn't have code that only closes sockets for connections, it calls a method that will actually clean up the connection state, such as rolling back active transactions, which when called on a connection in a child process will break later usage of the same connection in the parent process.</p>
<p>Sequel recommends calling <code>Sequel::Database#disconnect</code> before forking in the common case where you are dealing with a pool of worker processes. However, even if Ruby added a general before/after fork hook, Sequel would never use it automatically, since it is impossible to determine whether doing so is desired or not.</p>
<p>An additional issue is that not all forks are equal. Consider code such as:</p>
<pre><code class="ruby syntaxhl" data-language="ruby"><span class="no">DB</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">postgres</span>
<span class="no">DB</span><span class="p">.</span><span class="nf">disconnect</span>
<span class="n">pids</span> <span class="o">=</span> <span class="mi">4</span><span class="p">.</span><span class="nf">times</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span>
<span class="nb">fork</span> <span class="k">do</span>
<span class="c1"># worker process</span>
<span class="c1"># ...</span>
<span class="k">if</span> <span class="n">some_condition</span>
<span class="nb">fork</span> <span class="k">do</span>
<span class="n">do_something_in_worker_child</span>
<span class="nb">exit!</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre>
<p>The idea with this proposal is you could simplify things by eliminating the explicit <code>DB.disconnect</code> call before the first fork. However, doing so automatically could break things in a worker process if the second fork was called.</p>
<p>Another issue with this proposal is that in the above example, <code>DB.disconnect</code> is only called once when called explicitly, but would be called 4 times if called implicitly.</p>
<p>I understand why this is being requested, and I do think it could simplify common Ruby web application deployment scenarios. However, even if requires additional explicit code, I think the current approach of per-server fork hooks is better.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915122021-04-13T10:52:09Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/1604">@jeremyevans0 (Jeremy Evans)</a> I understand your point, and I agree that in an ideal world this wouldn't be necessary.</p>
<p>However pragmatically speaking, <code>getpid(2)</code> checks are about as old as <code>fork(2)</code>. Even the <a href="https://sourceware.org/glibc/wiki/Release/2.25#pid_cache_removal" class="external"><code>glibc</code> changelog</a> acknowledge it:</p>
<blockquote>
<p>Applications performing getpid() calls in a loop will see the worst case performance degradation as the library call will perform a system call at each invocation.</p>
</blockquote>
<p>And suggest to use <code>pthread_atfork()</code> instead:</p>
<blockquote>
<p>Applications performing getpid() in a loop that need to do some level of fork()-based invalidation can instead use pthread_atfork() to register handlers to handle the invalidation.</p>
</blockquote>
<p>I think it's a clear indication that this need is nothing new.</p>
<blockquote>
<p>The main issue I see with this is the potential for misuse.</p>
</blockquote>
<p>I agree, but this argument could apply to a large part of Ruby, I'm not sure it's relevant.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915142021-04-13T11:16:11ZEregon (Benoit Daloze)
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/7941">@byroot (Jean Boussier)</a> Thanks for the new issue, this seems much clearer than the previous one :)</p>
<p>jeremyevans0 (Jeremy Evans) wrote in <a href="#note-5">#note-5</a>:</p>
<blockquote>
<p>If a library starts using a <code>before_fork</code> or <code>after_fork</code> hook to close connections automatically, it breaks at least one of the two cases.</p>
</blockquote>
<p>And if it doesn't it's data corruption if there is any code between the <code>fork</code> and <code>exit!</code> accessing the DB, which I would argue is far worse.</p>
<blockquote>
<p>Sequel recommends calling Sequel::Database#disconnect before forking in the common case where you are dealing with a pool of worker processes. However, even if Ruby added a general before/after fork hook, Sequel would never use it automatically, since it is impossible to determine whether doing so is desired or not.</p>
</blockquote>
<p>Sequel could expose a method to enable/disable closing connections on fork.<br>
Then it's only a matter of which is the default.</p>
<p>Do people actually fork in the middle of a transaction? That seems fairly unsafe to me and I don't see the point.<br>
For the 0.1% people actually doing something like that, isn't it acceptable that they explicitly disable the hook via e.g. some Sequel API?<br>
IMHO safety/security is much more important than supporting very rare edge cases without an extra line of code.<br>
Anyway, Sequel is free to do what it wants, but clearly ActiveRecord already sides on the side of safety/security by default here. Is there any user complaining about that?</p>
<p>Closing before forking is suboptimal as the caller process will need to reconnect, and it might interrupt other threads in that process connected to the DB needlessly.<br>
I understand it might not be easy to close the connection in the child process, depending on the DB driver.<br>
IMHO, it is worth closing in the child when the driver provides a way to do it.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915162021-04-13T11:21:41ZEregon (Benoit Daloze)
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/5446">Feature #5446</a>: at_fork callback API</i> added</li></ul> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915202021-04-13T14:36:04Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>I'd like to make sure: what process calls the <code>after_fork</code> hook? Only parent, only child, or both?</p>
<p>In the previous ticket <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: at_fork callback API (Closed)" href="https://redmine.ruby-lang.org/issues/5446">#5446</a>, <code>atfork_parent</code> and <code>atfork_child</code> were discussed. Is the distinguishment not needed?</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915212021-04-13T14:46:26ZDan0042 (Daniel DeLorme)
<ul></ul><blockquote>
<p>Do people actually fork in the middle of a transaction?</p>
</blockquote>
<p>I think the most likely case is if the transaction is unrelated to the fork. Let's say your web backend handles every request inside a transaction. If somewhere in there you fork a child process in order to perform some asynchronous work, you don't want to close the DB connection.</p>
<p><code>Sequel.postgres.transaction{ exit! if fork }</code> is if you want to start DB work in the parent and continue in the child. I find that vanishingly improbable.</p>
<blockquote>
<p>I understand it might not be easy to close the connection in the child process, depending on the DB driver.<br>
IMHO, it is worth closing in the child when the driver provides a way to do it.</p>
</blockquote>
<p>I agree. Closing in the child is ideal; if the DB driver doesn't expose the socket then closing before the fork is usually a fine workaround.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915252021-04-13T15:17:46Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>what process calls the after_fork hook? Only parent, only child, or both?</p>
</blockquote>
<p>The <code>child</code>. But I agree that a better naming could be found to avoid that confusion. I don't have an idea just yet, but I'll think about it.</p>
<blockquote>
<p>Is the distinguishment not needed?</p>
</blockquote>
<p>From the use cases I compiled I saw no use for a callback in the parent after a fork. But maybe that use case exist and I missed it. Or maybe it's justified to add it simply to look a bit more like <code>pthread_atfork</code>.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915262021-04-13T16:32:05Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/7941">@byroot (Jean Boussier)</a> I see, thanks! I think that the intended behavior of the proposed APIs is valuable to be explained in the ticket description.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915352021-04-13T22:22:18Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p><del><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/18">@mame (Yusuke Endoh)</a> I'm afraid I don't have the permission to edit my own tickets.</del> Nevermind.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915402021-04-14T06:06:53Zko1 (Koichi Sasada)
<ul></ul><p>For debugger, I also want to use.<br>
I want to use 3 hooks like <code>pthread_atfork</code>.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915412021-04-14T07:31:46Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul><li><strong>Subject</strong> changed from <i>`before_fork` and `after_fork` callback API</i> to <i>Around `Process.fork` callbacks API</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/91541/diff?detail_id=59776">diff</a>)</li></ul> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915422021-04-14T07:34:27Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>I updated the description with:</p>
<ul>
<li>Added the third hook (after fork in parent) on @ko1's demand.</li>
<li>Added links to similar feature in C and Python.</li>
<li>Added pseudo-code of the implementation.</li>
<li>Added precision about not being able to unregister callback.</li>
<li>Added precision on call order (first registered vs last registered).</li>
<li>Added two proposed APIs.</li>
</ul> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915762021-04-16T08:03:56Zakr (Akira Tanaka)akr@fsij.org
<ul></ul><p>Another idea is introducing Process.fork_level which can be used to detect fork instead of getpid.</p>
<pre><code>Process.fork_level #=> 0
fork {
Process.fork_level #=> 1
fork {
Process.fork_level #=> 2
}
}
</code></pre>
<p>It doesn't need system call unlike getpid.<br>
Also, it doesn't have the problem with reused pid (pid collision).</p>
<p>Process.fork_level can be used to detect process forking in a library API<br>
and it can close persistent connections (DB connection, HTTP persistent connection, statistics server connection, etc.) at a forked child process.</p>
<p>I think that closing persistent connections at forked child processes is better than disconnect before fork and reconnect after fork at parent and child.<br>
Because some fork uses, such as parallel computing using fork, doesn't need to disconnect.<br>
Disconnecting at all fork invocation would cause needless disconnection, I think.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915782021-04-16T08:07:49Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>It doesn't need system call unlike getpid.</p>
</blockquote>
<p>It would be a win for sure, but in my opinion we might as well go all the way and have a callback triggered immediately upon fork rather than have the cleanup be delayed until the next access to whatever the protected resource is, as well as having to constantly check for that change.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915802021-04-16T13:28:11Zivoanjo (Ivo Anjo)ivo.anjo@datadoghq.com
<ul></ul><p>At Datadog, we use our monkey patching (mentioned in the description) to restart background threads that are used to periodically gather metrics. We use this so that we can transparently gather (profiling) metrics on all processes in Ruby apps that employ forking.</p>
<p>Thus, "passive" solutions -- checking the pid / fork level -- don't work for our use case. Hence why we're really really interested in the Fork callbacks / Make Kernel.fork a delegator solution.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915912021-04-17T07:56:46Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>This thicket was discussed in the dev meeting, but no conclusion was reached. The current situation is not so good, but some committers agreed with Jeremy's concern (easy to misuse). <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/271">@akr (Akira Tanaka)</a> proposed <code>fork_level</code> (as above). <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> said that he wanted to provide something helpful, but he was not sure what we should provide.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=915972021-04-17T11:02:55Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>some committers agreed with Jeremy's concern (easy to misuse).</p>
</blockquote>
<p>I must admit I don't understand this argument. There are plenty of advanced Ruby feature that are easy to misuse but extremely useful. So much that I don't think it's even worth listing them.</p>
<blockquote>
<p>akr (Akira Tanaka) proposed fork_level (as above).</p>
</blockquote>
<p>As answered above, in my opinion, any solution akin to checking PID before access is unsatisfactory because:</p>
<ul>
<li>It delays the cleanup to the next access, which in same case can be problematic.</li>
<li>Even if it's fast it's still needless busywork. Many use cases for this feature are things such as tracing libraries that are called very often in hotspots.</li>
</ul>
<blockquote>
<p>nobu: IO.popen("-") is also a fork</p>
</blockquote>
<p>As said in the description, <code>fork + exec</code> is out of scope. This only aim at catching forks that will continue to execute the Ruby VM.</p>
<blockquote>
<p>mame: How about extension libraries? Maybe we don't have to care them?</p>
</blockquote>
<p>Yes, we don't care about them, because as far as I know an extension calling <code>fork(2)</code> would leave the VM in an unusable state (let me know if I'm wrong). I looked at various servers etc, all of them use <code>Process.fork</code> or <code>Kernel.fork</code>, I couldn't find any example of a C-extension calling <code>fork(2)</code>.</p>
<blockquote>
<p>mrkn: Python provides C-API to call registered hooks</p>
</blockquote>
<p>It also provide a Python API. <a href="https://docs.python.org/3/library/os.html#os.register_at_fork" class="external">https://docs.python.org/3/library/os.html#os.register_at_fork</a></p>
<p>Could we at least settle on my delegation PR (<a href="https://github.com/ruby/ruby/pull/4361" class="external">https://github.com/ruby/ruby/pull/4361</a>) if a proper API is not acceptable?</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916002021-04-17T17:43:05Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>byroot (Jean Boussier) wrote in <a href="#note-21">#note-21</a>:</p>
<blockquote>
<p>I must admit I don't understand this argument. There are plenty of advanced Ruby feature that are easy to misuse but extremely useful. So much that I don't think it's even worth listing them.</p>
</blockquote>
<p>Sorry, I have to admit my ignorance in this area, so it is difficult for me to log the discussion of the meeting, but as far as I understand... Some committers said that an appropriate cleanup/restart process before/after fork depends on applications, not libraries (as Jeremy said). This API is apparently proposed just for libraries, so there seems to be nothing but misuse.</p>
<blockquote>
<blockquote>
<p>nobu: IO.popen("-") is also a fork</p>
</blockquote>
<p>As said in the description, <code>fork + exec</code> is out of scope.</p>
</blockquote>
<p>We are sure that <code>fork + exec</code> is out of scope. But <code>IO.popen("-")</code> is not <code>fork + exec</code> but <code>fork</code>, surprisingly.</p>
<pre><code>$ ruby -e 'IO.popen("-"); $stderr.puts "Hello"'
Hello
Hello
</code></pre>
<blockquote>
<blockquote>
<p>mame: How about extension libraries? Maybe we don't have to care them?</p>
</blockquote>
<p>Yes, we don't care about them, because as far as I know an extension calling <code>fork(2)</code> would leave the VM in an unusable state (let me know if I'm wrong). I looked at various servers etc, all of them use <code>Process.fork</code> or <code>Kernel.fork</code>, I couldn't find any example of a C-extension calling <code>fork(2)</code>.</p>
</blockquote>
<p>Thanks, I failed to log the answer to my question, but any committer (I forgot, maybe <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/17">@ko1 (Koichi Sasada)</a>?) said the same. I'm convinced.</p>
<blockquote>
<p>Could we at least settle on my delegation PR (<a href="https://github.com/ruby/ruby/pull/4361" class="external">https://github.com/ruby/ruby/pull/4361</a>) if a proper API is not acceptable?</p>
</blockquote>
<p>The proposal is not rejected yet. In the previous meeting, we talked about this ticket in about one hour, and <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/271">@akr (Akira Tanaka)</a>, one of the most familiar committers with this area, said that we can provide fork_level to support "Continuously check Process.pid" approach if it works. He also said that it is not a perferct solution (fork_level does not satisfy ko1's need explained in Comment <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: test issue for ruby-1.9 (Closed)" href="https://redmine.ruby-lang.org/issues/14">#14</a>), but we tentatively agreed that he counter-propose fork_level to see if it is good enough or not. I guess matz will finally change something for this ticket, but we have not yet determined what is best. Sorry for your patience.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916102021-04-19T09:55:25Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>Sorry, I have to admit my ignorance in this area, so it is difficult for me to log the discussion of the meeting</p>
</blockquote>
<p>I wasn't complaining about your log, thanks for the work you do, it's extremely valuable for people like me.</p>
<p>I've read the original argument by Jeremy, and I get that some people consider libraries shouldn't be in control of such things.</p>
<p>But I (and apparently a bunch of other library writers, since clearly a bunch of libraries already do it, and other languages offer that capability) disagree with this stance, and don't understand how Ruby design should be influenced by this types of "proper way to do it" arguments.</p>
<blockquote>
<p>But IO.popen("-") is not fork + exec but fork, surprisingly.</p>
</blockquote>
<p>My bad I overlooked the <code>"-"</code> and I wasn't aware of this feature. However looking at <code>IO.popen</code> implementation, it doesn't seem hard to support as well.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916392021-04-21T07:30:13Zsam.saffron (Sam Saffron)sam.saffron@gmail.com
<ul></ul><p>There is some precedent <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/1604">@jeremyevans0 (Jeremy Evans)</a> for library authors offering all the poisons to the users.</p>
<p>For example:</p>
<p><a href="https://github.com/redis/redis-rb/blob/6542934f01b9c390ee450bd372209a04bc3a239b/lib/redis/client.rb#L384-L389" class="external">https://github.com/redis/redis-rb/blob/6542934f01b9c390ee450bd372209a04bc3a239b/lib/redis/client.rb#L384-L389</a></p>
<blockquote>
<p>inherit_socket: true: disable safety check that prevents a forked child from sharing a socket with its parent; this is potentially useful in order to mitigate connection churn when:<br>
many short-lived forked children of one process need to talk to redis, AND<br>
your own code prevents the parent process from using the redis connection while a child is alive</p>
<p>Improper use of inherit_socket will result in corrupted and/or incorrect responses.</p>
</blockquote>
<p>So for example if this feature were given to redis they could get rid of the pid check on connected and instead rely on the fork hooks provided by Ruby.</p>
<p>There is a certain appeal to a Redis option where you can fork as much as you want and all your redis calls continue to work. Even if library authors ask you to opt in to this behavior. (eg: Redis.auto_safe_fork!)</p>
<p>I agree this is a super sharp tool, but it does give lib authors the ability to ship an "easy mode".</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916522021-04-22T13:21:40Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>Another instance I just stumbled upon, Dalli the dominant Memcached client checks <code>Process.pid</code> before every single command: <a href="https://github.com/petergoldstein/dalli/blob/c73e52bf2993877ad509938dd840f0544633e998/lib/dalli/server.rb#L210" class="external">https://github.com/petergoldstein/dalli/blob/c73e52bf2993877ad509938dd840f0544633e998/lib/dalli/server.rb#L210</a></p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916532021-04-22T14:34:43ZDan0042 (Daniel DeLorme)
<ul></ul><p>akr (Akira Tanaka) wrote in <a href="#note-17">#note-17</a>:</p>
<blockquote>
<p>Another idea is introducing Process.fork_level which can be used to detect fork instead of getpid.</p>
</blockquote>
<p>If that's possible, then it should be equally possible to just [pre-]cache <code>Process.pid</code> and it would be just as performant, right? Whatever were the reasons for pid cache removal in glibc, they would also apply to <code>fork_level</code>. From what I understand above it's not relevant here because calling <code>fork(2)</code> would leave the VM in an unusable state. Since checking the pid is a pattern that already exists, imho it's better to make that pattern performant than to introduce a new pattern with almost no advantage. I don't think pid reuse/collision is a realistic concern either.</p>
<p>But checking either <code>Process.pid</code> or <code>fork_level</code> doesn't fix one of the biggest issues I've had with forking: <em>finalizers</em>. If the database object has a finalizer that closes the connection, you need to prevent that finalizer from ever executing in the child process. You can use <code>fork{ work(); exit! }</code> to prevent finalizers from running on exit, but that also prevents any finalizers that were defined in the child process. So you actually need to use <code>fork{ at_exit{exit!}; work() }</code>, but even then it doesn't cover the case where a finalizer is run when the object is garbage-collected. This is a real thorn, and to me it's the main reason why the socket must be closed in the child process right after fork.</p>
<p>byroot (Jean Boussier) wrote in <a href="#note-23">#note-23</a>:</p>
<blockquote>
<p>But I (and apparently a bunch of other library writers, since clearly a bunch of libraries already do it, and other languages offer that capability) disagree with this stance, and don't understand how Ruby design should be influenced by this types of "proper way to do it" arguments.</p>
</blockquote>
<p>+1; for me I believe fork safety should be handled by libraries because of encapsulation. If you have independant libs, libA that needs to close sockets on fork, and libB that uses fork, these two should remain independant. You should not need the app to insert glue code in order for the two libs to play nice together. That really breaks encapsulation and separation of concerns. 2¢</p>
<blockquote>
<p>Make Kernel.fork a delegator</p>
</blockquote>
<p>I have two concerns with this:</p>
<ol>
<li>Some existing gems already override both <code>Kernel#fork</code> and <code>Process.fork</code> in order to setup callbacks; this change would cause the callbacks to be triggered twice when using <code>Kernel#fork</code>. So it's somewhat backward-incompatible.</li>
<li>You need to handle the with-block and without-block forms differently.</li>
</ol>
<p>So I'd like to make a counter-proposal: what about introducing a separate method (let's say <code>Process._fork_</code>) which <em>doesn't accept a block</em> and is called by the various forking methods, including <code>IO.popen("-")</code>. That would solve the two concerns above, and it might even be simple enough to be backported.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916542021-04-22T15:23:03ZDan0042 (Daniel DeLorme)
<ul></ul><p>from dev meeting notes:</p>
<blockquote>
<p>akr: Process.daemon is very special, so it doesn't have to call the hooks (or update fork_level)</p>
</blockquote>
<p>It should be noted that <code>Process.daemon</code> has the effect of stopping active threads, just like <code>fork</code>. Since one of the stated goals in this ticket is to "restart the thread in the forked children", I think callbacks are also relevant for <code>Process.daemon</code>. Or maybe don't stop active threads? At least it should be consistent.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916592021-04-22T18:58:35Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>Or maybe don't stop active threads?</p>
</blockquote>
<p>That's not possible. After <code>fork(2)</code> all but the thread that called <code>fork(2)</code> die in the children.</p>
<blockquote>
<p>So I'd like to make a counter-proposal: what about introducing a separate method (let's say Process.<em>fork</em>) which doesn't accept a block and is called by the various forking methods, including IO.popen("-"). That would solve the two concerns above</p>
</blockquote>
<p>Very good idea. If the proper callback API is not accepted, but the idea of having a proper chokepoint is, then your proposal is better than mine.</p>
<blockquote>
<p>one of the biggest issues I've had with forking: finalizers.</p>
</blockquote>
<p>That's a good point I didn't thought of. Undefining serializers is another use case.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=916752021-04-23T16:26:09ZEregon (Benoit Daloze)
<ul></ul><p>I very much agree with @Dan0042's point about <code>exit!</code>, finalizers, <code>at_exit</code> and subtleties.<br>
This can be easily be handled with fork hooks, and it's a total nightmare otherwise, to the point many apps/libraries probably get it wrong<br>
(e.g., exit a different way than <code>exit!</code> in the child, which e.g. an exception reaching the top-level is already an issue, <code>ruby -e 'at_exit { p "hi" }; fork { raise "foo" }'</code> prints twice)<br>
or needlessly restrict what child processes can do with <code>at_exit</code>.</p>
<p>I also heavily agree with encapsulation, requiring the app to know about every library which might have some IO/Socket/resourced opened that's not naturally fork-safe and what to call on each of these libraries with a sever-specific hook is a terrible programming model.<br>
It invites invalid sharing and data corruption, so I'd go as far as saying libraries not handling are basically guaranteeing troubles for their users every now and then if they use <code>fork</code>.<br>
The <code>redis-rb</code> approach above makes a lot of sense: safe by default and can be tuned if desired for more dangerous cases where sharing is wanted.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=917892021-05-03T11:34:03ZEregon (Benoit Daloze)
<ul></ul><p>Another use-case I found: <a href="https://github.com/rubyjs/mini_racer#fork-safety" class="external">https://github.com/rubyjs/mini_racer#fork-safety</a><br>
That gem (and others of course) could provide some helpers methods based on the chosen strategy if there is a common hook.</p>
<p>Right now providing fork safety (if it has any resource that needs handling across fork) is basically impossible for a gem, which seems a severe limitation.<br>
And we all know that if something is not fork-safe and fork is used it's often leading to data corruption.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=920842021-05-21T17:44:59Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> and some committers discussed this ticket for about an hour in today's dev-meeting, but we didn't reach any conclusion again.</p>
<p>Mainly, <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/271">@akr (Akira Tanaka)</a> -san still agrees with <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/1604">@jeremyevans0 (Jeremy Evans)</a> and feels a bad smell about the proposed hook API. It is difficult for me to summarize his opinion precisely, but my partial understanding is:</p>
<ul>
<li>The API looks difficult to use correctly. The example that Jeremy says in <a href="#note-5">#note-5</a> (fork in a transaction) might look a bit artificial, but it may occur in parallel; one thread runs a transaction and the other one calls fork.</li>
<li>We understood that fork_level was insufficient for datadog case. However, it is still a preferable solution in a case where PID checking or fork_level is sufficient. If we add a casual API to hook a fork event, people may choose and (mis)use it blindly.</li>
<li>If something is introduced for this issue, the API should look clearly not casual, not easy to use. <code>Process._fork_</code> that <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/11019">@Dan0042 (Daniel DeLorme)</a> proposed in <a href="#note-26">#note-26</a> is a possible option. <code>TracePoint.new(:before_fork) { ... }.enable { ... }</code> or something may be another one.</li>
</ul>
<p>Honestly, I have no idea how to proceed this issue. <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/1604">@jeremyevans0 (Jeremy Evans)</a>, do you still think that this proposal is not suitable? Do you have any idea about counterproposals?</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=920912021-05-21T21:23:47Zjeremyevans0 (Jeremy Evans)merch-redmine@jeremyevans.net
<ul></ul><p>I still believe neither of these proposals (fork callbacks or Kernel.fork delegation) should be accepted. In general, the proposals are for the same feature (wrapping behavior on fork), so in my opinion there is not much difference between them, and this response will treat them as being the same.</p>
<p>Here's what I consider the primary benefit of adding support for fork callbacks:</p>
<ul>
<li>They handle the common case easily with no code required by the library user.</li>
</ul>
<p>Here's what I consider the primary cost of adding support for fork callback:</p>
<ul>
<li>They invoke the fork callbacks when they should not be invoked, because libraries cannot know the purpose of the fork in other libraries or in application code.</li>
</ul>
<p>I don't want to understate the benefit, language-level fork callbacks make certain common cases easier, and that is a valuable goal. However, I consider the cost too high.</p>
<p>Without reiterating my previous points too much, not all forks are the same. A fork made by a webserver library is different than a fork made in application code. With language-level fork callbacks, the callback has no context about why the fork is happening, and therefore cannot make appropriate decisions.</p>
<p>With the current situation, each library that uses fork generally implements their own callbacks around the library's use of fork, and users use those library-specific callbacks as appropriate to handle resources (e.g. closing sockets). This results in no problems when library-specific callbacks are used correctly. Almost all of the problems in this area occur when the library-specific callbacks should be used and are not.</p>
<p>What will likely happen if this is implemented is that libraries will no longer implement fork callbacks, instead telling users to use language-level fork callbacks. Worse, libraries that create resources (e.g. opening sockets) will start to automatically use the language-level fork callbacks to close the resources on fork, regardless of whether doing so makes sense for the application using the library. This will lead to a situation where the common case is easier but the more complex case is broken completely with no ability to be fixed. If we are lucky, the libraries that create resources will offer a way to opt out of using the callbacks automatically. All of this is speculation on my part, but that is the future I foresee if this proposal is accepted.</p>
<p>In terms of Sequel, since it has been mentioned, only disconnecting before fork is considered safe. Disconnecting after fork is not safe as forks could be operating on the same connection, and Sequel may not have direct access to the underlying socket to close (Sequel supports 10+ adapters built-in, and more in external gems). Disconnecting before fork can only be done safely if you are sure nothing else is using the connections, such as before a web server forks child processes. At any other point, automatic disconnection before fork would be terrible, as the connections could be in use by other threads.</p>
<p>Another, more minor issue, is that a language-level before fork callback will necessarily be called on every fork, whereas current library-specific before fork callbacks are generally only called once before forking multiple times.</p>
<p>I think the problem does not have a good solution. However, in my opinion, the current situation is the least worst. I don't think libraries should try to detect forks using <code> Process.pid</code> or <code>Thread.alive?</code> or <code>fork_level</code>. Libraries should document how they should be used with fork, and should rely on the user correctly disconnecting before or after fork. Users that do not do so will likely have problems. To paraphrase Farquaad, "Some processes may die, but it's a sacrifice I am willing to make". :)</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=923072021-06-02T07:46:06Zivoanjo (Ivo Anjo)ivo.anjo@datadoghq.com
<ul></ul><p>Since there seems to be no agreement on what a better and high-level API would look like, would it be reasonable to go back to the "Make Kernel.fork a delegator" proposal from <a href="https://github.com/ruby/ruby/pull/4361" class="external">https://github.com/ruby/ruby/pull/4361</a>?</p>
<p>That patch seems like low-hanging fruit that improves the status quo for almost no added complexity, and also doesn't stop or conflict with the future introduction of better APIs.</p>
<p>Better yet, by having a single point-of-entry for extension, it simplifies all of the hacks out there, and can even be used as a building block for a common community implementations of some of the more higher-level approaches proposed above.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=923082021-06-02T08:40:39Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p>Agreed, but I think @Dan0042's <code>Process._fork_</code> proposal is better.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=929012021-07-15T07:18:46Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p>I considered the idea of overriding <code>_fork_</code> method. The idea itself sounds reasonable but <code>_fork_</code> is not a good name.<br>
It should be a more descriptive name, forking a new child process, expecting overriding. Any idea?</p>
<p>Matz.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=929022021-07-15T07:29:50Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>Discussed at today's dev-meeting.</p>
<ul>
<li><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/271">@akr (Akira Tanaka)</a> agreed with the API design of <code>Process._fork_</code>
</li>
<li>As above, <a class="user active user-mention" href="https://redmine.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> disliked the name <code>_fork_</code>.</li>
<li>I proposed <code>__fork__</code>, but matz dislike it too. The message of the names <code>__send__</code> and <code>__id__</code> is that a user MUST NOT override them, but <code>__fork__</code> is supposed to be overridden, so the message is completely opposite.</li>
<li>matz proposed <code>sysfork</code>, but akr disagreed because the method is not a bare wrapper of <code>fork(2)</code>
</li>
<li>We failed to find any reasonable name for the method. Please propose other good name candidates.</li>
</ul> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=929032021-07-15T07:47:02Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="process.c: Call rb_thread_atfork in rb_fork_ruby All occurrences of rb_fork_ruby are followed by..." href="https://redmine.ruby-lang.org/projects/ruby-master/repository/git/revisions/645616c273aa9a328ca4ed3fceac8705e2e036cd">git|645616c273aa9a328ca4ed3fceac8705e2e036cd</a>.</p>
<hr>
<p>process.c: Call rb_thread_atfork in rb_fork_ruby</p>
<p>All occurrences of rb_fork_ruby are followed by a call rb_thread_fork in<br>
the created child process.</p>
<p>This is refactoring and a potential preparation for [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Around `Process.fork` callbacks API (Closed)" href="https://redmine.ruby-lang.org/issues/17795">#17795</a>].<br>
(rb_fork_ruby may be wrapped by Process.<em>fork</em>.)</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=929192021-07-15T20:15:10Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/18">@mame (Yusuke Endoh)</a> did you mean to close the ticket?</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=929202021-07-15T20:40:21Zbyroot (Jean Boussier)byroot@ruby-lang.org
<ul></ul><blockquote>
<p>Please propose other good name candidates.</p>
</blockquote>
<p>Maybe <code>Process.forkpid</code>?</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=929232021-07-16T06:07:56Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Closed</i> to <i>Open</i></li></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/7941">@byroot (Jean Boussier)</a> Oh no! Reopening. Sorry for the noise.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=941572021-10-17T19:39:52ZDan0042 (Daniel DeLorme)
<ul></ul><p>Here's hoping that a name, any name, will get matz' approval so this can make it into ruby 3.1</p>
<p><code>Process._fork</code><br>
A name begining with an underscore is often used to indicate an internal/private method.</p>
<p><code>Process.fork!</code><br>
A bang to communicate "be careful" when overriding this method.</p>
<p><code>Process.wrap_fork</code><br>
<code>Process.around_fork</code><br>
Focus on the purpose of the overridden method; when we do <code>def wrap_fork</code> we are defining a method that wraps the fork behavior. Similar to around_filter in Rails, in a sense.</p>
<p><code>RubyVM.fork</code><br>
<code>Process::Wrap.fork</code><br>
"fork" is really the most appropriate name for this method, so instead we communicate "this is a special case for overriding" via the module name.<br>
This goes back to @znz's suggestion of <code>Process::Fork.fork</code> which I find quite good.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=942822021-10-25T02:36:24Zmatz (Yukihiro Matsumoto)matz@ruby.or.jp
<ul></ul><p>The new method should be override (or wrap) target, so <code>wrap_</code> or <code>around_</code> are not a part of proper names.<br>
I propose <code>Process._fork</code> for the method name, since:</p>
<ul>
<li>It is not for typical users</li>
<li>It is the target for overriding</li>
<li>We already have internal methods names like <code>_dump</code>, etc.</li>
</ul>
<p>Matz.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=942852021-10-25T04:18:01Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p><a class="user active user-mention" href="https://redmine.ruby-lang.org/users/13">@matz (Yukihiro Matsumoto)</a> Thanks. I've created a prototype PR: <a href="https://github.com/ruby/ruby/pull/5017" class="external">https://github.com/ruby/ruby/pull/5017</a></p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=943012021-10-25T10:03:41Zioquatix (Samuel Williams)samuel@oriontransfer.net
<ul></ul><p>I agree with Jeremy, but I'll go a step further. If we want Ruby to be safe, the only sane way to handle fork (in general) is to close all resources in the child process (as already happens with Thread instances, and similarly with close on exec style behaviour). Any other model requires cooperation between the parent and child process, as Jeremy outlines and simply has to involve a shared understanding of what happens before and after fork in a highly application specific way.</p>
<p>I feel like adding an <code>at_fork</code> or <code>after_fork</code> hook should be sufficient for 99% of use cases, however we should allow users to bypass it, as in:</p>
<pre><code>after_fork do
connection_pool.close_on_fork
end
connection = connection_pool.acquire
connection.close_on_fork = false
fork do
connection.query(...) # connection is still valid
end
</code></pre>
<p>However this kind of usage really seems like the edge of the edge cases and it begs the question, do we want to encourage libraries and applications to do this?</p>
<p>The more valid example was found by digging into the Datadog profiler:</p>
<pre><code>after_fork do
Datadog.profiler.start if Datadog.profiler
end
</code></pre>
<p>This to me seems like a totally legitimate use case but I don't think this needs <code>_fork</code> and I think it makes more sense to be consistent with <code>at_exit</code>, e.g. <code>at_fork</code> could be a better name, only executed in the child process. Wrapping <code>Process#_fork</code> seems very specific to how the Process module is implemented and I don't think we should be exposing that or expecting users to <code>prepend</code> to standard library modules to get functionality. It sets a bad precedent IMHO.</p> Ruby master - Feature #17795: Around `Process.fork` callbacks APIhttps://redmine.ruby-lang.org/issues/17795?journal_id=943052021-10-25T11:55:11Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Closed</i></li></ul><p>Applied in changeset <a class="changeset" title="process.c: Add Process._fork (#5017) * process.c: Add Process._fork This API is supposed for ap..." href="https://redmine.ruby-lang.org/projects/ruby-master/repository/git/revisions/13068ebe32a7b8a1a9bd4fc2d5f157880b374e1d">git|13068ebe32a7b8a1a9bd4fc2d5f157880b374e1d</a>.</p>
<hr>
<p>process.c: Add Process._fork (<a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: Mysql::Error: #08S01Bad handshake (Rejected)" href="https://redmine.ruby-lang.org/issues/5017">#5017</a>)</p>
<ul>
<li>process.c: Add Process._fork</li>
</ul>
<p>This API is supposed for application monitoring libraries to hook fork<br>
event.</p>
<p>[Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Around `Process.fork` callbacks API (Closed)" href="https://redmine.ruby-lang.org/issues/17795">#17795</a>]</p>
<p>Co-authored-by: Nobuyoshi Nakada <a href="mailto:nobu@ruby-lang.org" class="email">nobu@ruby-lang.org</a></p>