Project

General

Profile

Actions

Misc #14937

closed

[PATCH] thread_pthread: lazy-spawn timer-thread only on contention

Added by normalperson (Eric Wong) over 6 years ago. Updated over 6 years ago.

Status:
Closed
Assignee:
-
[ruby-core:88088]

Description

[ruby-core:87773]

thread_pthread: lazy-spawn timer-thread only on contention

To reduce resource use and reduce CI failure; lazy spawn
timer-thread only in processes which use Ruby Threads AND those
Ruby Threads hit contention.  Single-threaded Ruby processes
(including forked children) will never have timer-thread
overhead.

To simplify the thread_pthread.c code, I eliminated busy timer
thread [Misc #14851].  Maybe the thread_win32.c code can use
self-pipe, too; and they won't need busy wakeups.

There is only one self-pipe, now, as wakeups for timeslice are
handled via condition variables.  This reduces FD pressure
slightly.

Signal handling is handled directly by one Ruby Thread (instead
of timer-thread) by exposing signal self-pipe to callers of
rb_thread_fd_select, native_sleep, rb_wait_for_single_fd, etc...
Acquiring, using, and releasing the self-pipe is exposed via 4
new internal functions:

1) rb_sigwait_fd_get - exclusively acquire timer_thread_pipe.normal[0]

2) rb_sigwait_fd_sleep - sleep and wait for signal (and no other FDs)

3) rb_sigwait_fd_put - release acquired result from rb_sigwait_fd_get

4) rb_sigwait_fd_migrate - migrate signal handling to another thread
		       after calling rb_sigwait_fd_put.

rb_sigwait_fd_migrate is necessary for waitpid callers because
only one thread can wait on self-pipe at a time, otherwise a
deadlock will occur if threads fight over the self-pipe.

TRAP_INTERRUPT_MASK is now set for the main thread directly in
signal handler via rb_thread_wakeup_timer_thread.

Originally, I wanted to use POSIX timers
(timer_create/timer_settime) and avoid timer-thread completely.
Unfortunately, this proved unfeasible for one reason:
Mutex#sleep resumes on spurious wakeups and
test/thread/test_cv.rb::test_condvar_timed_wait failed.

In the future, I hope [Feature #14717] is accepted so Threads
may be made non-preemptible.  This will allow users to prevent
timer-thread creation completely.

git repository also available at:
https://80x24.org/ruby.git tt-lazy
(commit a2990cefccba55300ad44275ee4adf18e6f95ece)


Files

Updated by normalperson (Eric Wong) over 6 years ago

https://bugs.ruby-lang.org/issues/14937

Hi Greg, Can you try the patch in this ticket? It shouldn't
affect win32, but maybe I made typos. I couldn't add you
as a watcher to this ticket from Redmine (since that requires
JavaScript :<)

Thanks.

Updated by normalperson (Eric Wong) over 6 years ago

git repository also available at:
https://80x24.org/ruby.git tt-lazy
(commit a2990cefccba55300ad44275ee4adf18e6f95ece)

Now up to commit 6fb72ad7334a61e3c4341aa5fa17749671cbb867

[PATCH 2/1] thread.c (rb_threadptr_execute_interrupts): drain pipe and handle SIGCHLD
https://80x24.org/spew/20180725001947.77uutfvtkn3cq235@whir/raw

@Greg: [PATCH 2/1] should not affect win32 users at all, either

Updated by normalperson (Eric Wong) over 6 years ago

git repository also available at:
https://80x24.org/ruby.git tt-lazy
(commit a2990cefccba55300ad44275ee4adf18e6f95ece)

Now up to commit 6fb72ad7334a61e3c4341aa5fa17749671cbb867

Sorry, didn't finish testing the other one :x
now up to commit eac43426d79dd4d2e455d3bb8e161e014378893e

[PATCH 3/1] SQUASH: fixup FD draining
https://80x24.org/spew/20180725003101.ey2mmcu7mauvmfew@whir/raw

Updated by MSP-Greg (Greg L) over 6 years ago

@normalperson (Eric Wong)

Eric,

Tested with the attached patch. When Appveyor is busy, and with test-all using retry, and something in MJIT failing or erroring, the test took 47 minutes.

Anyway, all good, ran with r64038, and all tests passed. Results below.

On Windows, Process.respond_to?(:daemon) returns false, so the changed test doesn't run...

Thanks, Greg

0 Total Failures/Errors                           Build No 975    Job Id e9se4873fpb4owjs
  ruby 2.6.0dev (2018-07-25 trunk 64038) [x64-mingw32]
  2018-07-25 00:45:29 UTC

test-all  19258 tests, 2242240 assertions, 0 failures, 0 errors, 106 skips, 106 skips shown

test-spec  3607 files, 27958 examples, 209426 expectations, 0 failures, 0 errors, 0 tagged
mspec      3607 files, 27960 examples, 209308 expectations, 0 failures, 0 errors, 0 tagged

test-basic test succeeded
btest      PASS all 1386 tests

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

Tested with the attached patch. When Appveyor is busy, and
with test-all using retry, and something in MJIT failing or
erroring, the test took 47 minutes.

Anyway, all good, ran with r64038, and all tests passed. Results below.

Thanks for testing; I guess 47 minutes is not unusual for a busy CI machine?
It seems long... "make exam" on an ancient 1.6GHz Pentium-M (Centrino)
laptop takes ~20 min on Debian 9.

Updated by normalperson (Eric Wong) over 6 years ago

Koichi Sasada wrote:

I need to read your proposal with more concentrations, but one thing:

Thanks. I noticed a bug where it sometimes still get
stuck when I run "make exam" in a loop. Will have to dig deeper
to solve... (this is a tricky change :x)

On 2018/07/25 8:48, wrote:

Mutex#sleep resumes on spurious wakeups and

I think Mutex#sleep should care about spurious wakeups, shouldn't?

I don't know, current behavior seems intentional and documented
in RDoc:

  • Note that this method can wakeup without explicit Thread#wakeup call.
  • For example, receiving signal and so on.
    */

I don't know the reasoning behind it, maybe matz does.

Updated by ko1 (Koichi Sasada) over 6 years ago

On 2018/07/25 17:20, Eric Wong wrote:

I don't know, current behavior seems intentional and documented
in RDoc:

  * Note that this method can wakeup without explicit Thread#wakeup call.
  * For example, receiving signal and so on.
  */

I don't know the reasoning behind it, maybe matz does.

Mutex#sleep is written by me :p

--
// SASADA Koichi at atdot dot net

Updated by MSP-Greg (Greg L) over 6 years ago

@normalperson (Eric Wong)

Eric,

In addition to the build mentioned above, there have been two more ruby-loco builds, both with all three patches, and both passed all tests

Thanks for testing; I guess 47 minutes is not unusual for a busy CI machine?

Re Appveyor:

  • 6 minutes is updating the build system
  • I think each CI is run on 2 cores of a 4 core system, the system may be competing with another CI for disk IO, etc. The build for r64024 completed in 37 minutes.
  • I run both make test-rubyspec and mspec.
  • I think MJIT is one of the longer running suites, and with retry, it is always run twice.

Thanks again, Greg

Actions #9

Updated by normalperson (Eric Wong) over 6 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r64062.


cont.c (ec_switch): prevent delayed/missed trap interrupt race

timer-thread may set trap interrupt with rb_threadptr_check_signal
at any time independent of GVL. This means timer-thread may set
the trap interrupt flag on the previous execution context; causing
the flag to be unnoticed until a future ec switch (or lost
completely if the ec is done).

Note: I avoid relying on th->interrupt_lock here and use
atomics because we won't be able to rely on it for proposed lazy
timer-thread [Misc #14937].

This regression affects Ruby 2.5 as it was introduced by moving
interrupt_flag to `ec' which is an unstable pointer. Ruby <= 2.4
was unaffected because vm->main_thread->interrupt_flag never
changed.

[ruby-core:88119] [Bug #14939]

Updated by normalperson (Eric Wong) over 6 years ago

Btw, I can probably improve on this by making GVL waiter perform
timeslice; so it would eliminate timer-thread completely. Will
work on it tomorrow (too tired, now)

Updated by normalperson (Eric Wong) over 6 years ago

https://bugs.ruby-lang.org/issues/14937

OK, new patch to eliminate timer-thread completely. It seems to pass
repeated "make exam" and introduce no regressions. So far I have found
and fixed two bugs while working on this.

[Bug #14939] lost trap interrupts during ec_switch
[Bug #14945] unregister_ubf_list ordering

And killed a lot of code from thread_pthread.c

Greg: can you test this patch instead? Thanks.

https://80x24.org/spew/20180728201723.20279-1-e@80x24.org/raw
Also available via git:


The following changes since commit 443f4d583c8fe78198bee791f2ac3da0be2dfb5e:

mjit.c: introduce JIT compaction [experimental] (2018-07-28 16:14:56 +0000)

are available in the Git repository at:

https://80x24.org/ruby.git tt-designated

for you to fetch changes up to c49e163a5c0a813aa511ecbf92900460e573d2e3:

thread_pthread: remove timer-thread by restructuring GVL (2018-07-28 20:13:25 +0000)

Updated by MSP-Greg (Greg L) over 6 years ago

@normalperson (Eric Wong)

Eric,

Passed all tests using r64094. TestJIT#test_unload_units has been failing (both in parallel & retry) since it was added. Been meaning to post an issue/bug report about it...

Thanks, Greg

——————————————————————————————————————————————————————————————————————————————— Test Results
0 Total Failures/Errors                           Build No 991    Job Id 6xo1217qns66ktsu
  2018-07-28 22:15:31 UTC

test-all  19273 tests, 2240557 assertions, 1 failures, 0 errors, 108 skips, 108 skips shown

test-spec  3607 files, 27958 examples, 209418 expectations, 0 failures, 0 errors, 0 tagged
mspec      3607 files, 27960 examples, 209308 expectations, 0 failures, 0 errors, 0 tagged

test-basic test succeeded
btest      PASS all 1386 tests


——————————————————————————————————————————————————————————————————————————————— Summary test-all
19273 tests, 2240557 assertions, 1 failures, 0 errors, 108 skips, 108 skips shown

————————————————————————————————————————————————————————————————— Parallel Tests - 2 Failures

     2 TestJIT#test_compile_insn_putstring_concatstrings_tostring = 1.32 s = F
     1 TestJIT#test_unload_units = 4.77 s = F

————————————————————————————————————————————————————————————————— Parallel Tests - 1 Error

     0 TestIO#test_select_leak = 60.37 s = E

————————————————————————————————————————————————————————————————— After Retry - 1 Failure
                                                                    ruby/test_jit.rb

TestJIT#test_unload_units                                           Line: 830  
Failed to run script with JIT:

Updated by normalperson (Eric Wong) over 6 years ago

https://bugs.ruby-lang.org/issues/14937

https://80x24.org/spew/20180728201723.20279-1-e@80x24.org/raw

Hi Greg, thanks for testing. I'll let k0kubun deal with JIT
failures. For threading, I found more code to delete
(goes on top of above patch):

https://80x24.org/spew/20180729002256.22651-1-e@80x24.org/raw

Also pushed to my git repo:

The following changes since commit 443f4d583c8fe78198bee791f2ac3da0be2dfb5e:

mjit.c: introduce JIT compaction [experimental] (2018-07-28 16:14:56 +0000)

are available in the Git repository at:

https://80x24.org/ruby.git tt-designated

for you to fetch changes up to 776e12582ba847e0d4b96fc1fcf05e2cd7049ee7:

drop thread_destruct_lock and interrupt current ec directly (2018-07-29 00:16:27 +0000)


Eric Wong (2):
thread_pthread: remove timer-thread by restructuring GVL
drop thread_destruct_lock and interrupt current ec directly

internal.h | 3 +
process.c | 103 +++++++-
signal.c | 7 +-
test/ruby/test_process.rb | 2 +-
thread.c | 327 ++++++++++++++++----------
thread_pthread.c | 586 +++++++++++++++-------------------------------
thread_pthread.h | 6 +-
thread_win32.c | 27 ++-
vm_core.h | 8 +-
9 files changed, 526 insertions(+), 543 deletions(-)

Updated by MSP-Greg (Greg L) over 6 years ago

@normalperson (Eric Wong)

Eric,

Ran both patches on top of r64095, result was similar to above (passed as above).

Thanks, Greg

Updated by normalperson (Eric Wong) over 6 years ago

Thanks again! I'll probably commit soon since this looks very
straightforward (it took a long time to get there, though;
probably the most challenging thing I've done for Ruby).

Updated by k0kubun (Takashi Kokubun) over 6 years ago

Congrats to achieve this. I think this should be a notable change written in NEWS :)

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

Congrats to achieve this. I think this should be a notable
change written in NEWS :)

Thanks, I plan to :> I don't think we're out of the woods,
yet, sky3 had a failure at
http://ci.rvm.jp/results/trunk-test@ruby-sky3/1173398 and it
might be related. Investigating...

Updated by normalperson (Eric Wong) over 6 years ago

yet, sky3 had a failure at
http://ci.rvm.jp/results/trunk-test@ruby-sky3/1173398 and it
might be related. Investigating...

I thought r64133 fixed it, but I still saw this failure:

http://ci.rvm.jp/results/trunk@P895/1173951

I can't reproduce that regardless of multi or single-core
and I'm not sure what is wrong... (Maybe I need food :x)

Also, vm_thread_condvar1 performance regressed (otherwise,
benchmark performance seems improved overall). But I think
I need to redo sleep/wakeups again for auto-fiber
[Feature #13618] anyways...

Updated by normalperson (Eric Wong) over 6 years ago

http://ci.rvm.jp/results/trunk@P895/1173951

Damnit, this is because IO#gets on blocking pipes doesn't
hit rb_wait_for_single_fd and thus never gets to check
sigwait_fd (self-pipe)

Anyways, I hope it is acceptable to make all pipes
(and eventually sockets) non-blocking by default;
or I will have to revert and reintroduce timer-thread :<

https://bugs.ruby-lang.org/issues/14968

/me resumes beating his head against the wall...

Updated by normalperson (Eric Wong) over 6 years ago

http://ci.rvm.jp/results/trunk@P895/1173951

or I will have to revert and reintroduce timer-thread :<

reverted for now :< r64203

/me resumes beating his head against the wall...

Updated by normalperson (Eric Wong) over 6 years ago

or I will have to revert and reintroduce timer-thread :<

reverted for now :< r64203

Btw, using POSIX timers is a solution for this; because
timer_settime(2) is async-signal safe and I can call it
from signal handlers.

(pthreads platforms w/o POSIX timers can use emulation via
extra thread)

Updated by normalperson (Eric Wong) over 6 years ago

Btw, using POSIX timers is a solution for this; because
timer_settime(2) is async-signal safe and I can call it
from signal handlers.

(pthreads platforms w/o POSIX timers can use emulation via
extra thread)

Done:

The following changes since commit 52102f6ff50eebf8c16667c9b49cef579d2057c1:

test/lib/leakchecker.rb (find_tempfiles): don't warn for missing files (2018-08-09 02:14:27 +0000)

are available in the Git repository at:

https://80x24.org/ruby.git tt-posix-v2

for you to fetch changes up to ec91b49271b280cee7b455a480bf43867fb8e9fd:

thread_pthread: use POSIX timer or thread to get rid of races (2018-08-09 03:45:42 +0000)


Eric Wong (2):
thread_pthread.c: eliminate timer thread by restructuring GVL
https://80x24.org/spew/20180809034618.20082-1-e@80x24.org/raw

thread_pthread: use POSIX timer or thread to get rid of races
https://80x24.org/spew/20180809034618.20082-2-e@80x24.org/raw

configure.ac | 8 +
internal.h | 3 +
process.c | 140 ++++++-
signal.c | 7 +-
test/ruby/test_io.rb | 9 +-
test/ruby/test_process.rb | 2 +-
test/ruby/test_thread.rb | 5 +-
thread.c | 397 +++++++++++--------
thread_pthread.c | 946 ++++++++++++++++++++++++++--------------------
thread_pthread.h | 20 +-
thread_win32.c | 29 +-
vm_core.h | 8 +-
12 files changed, 991 insertions(+), 583 deletions(-)

I will commit next week (may not be around over the weekend :<).

Updated by ko1 (Koichi Sasada) over 6 years ago

Let me clear your commits.
Maybe "Description" of this ticket is obsolete.

They are my understandings (fix me if they are wrong):

  • Your idea is using POSIX timer if possible.
#define UBF_TIMER_NONE 0
#define UBF_TIMER_POSIX 1
#define UBF_TIMER_PTHREAD 2

UBF_TIMER_POSIX uses POSIX timer. UBF_TIMER_PTHREAD is traditional timer thread approach.

BTW

 * UBF_TIMER is to close TOCTTOU signal race on programs
 * without GVL contention blocking read/write to sockets.

I can't understand these lines.
What is related between GVL contention and read/write sockets?
Single thread application do not kick gvl_acquire_common and timer_thread_function() is not kicked, right?

  • GVL acquire loop invoke timer_thread_function()

... sorry I'm giving up to understand them correctly.

Could you explain more for current trunk?

After your commit, there are no timer threads?


Comments/Questions

gvl_acquire_common

VM_ASSERT(th->unblock.func == 0 && "we reuse ubf_list for GVL waitq");

I strongly disagree that reusing. It is very confusing.

rb_timer_(|dis)arm()

English question (sorry). What does it mean with word "arm"?

native_cond_timeout()

This function is not good name because I misunderstand it is same as native_cond_timedwait().
Maybe I had named it :p

list_add_tail(&vm->gvl.waitq, &nd->ubf_list);

Why add it? Whatvm->gvl.waitq manages?
To specify designate_timer_thread()

There are two condvars gvlq and switch_cond. Is it intentional?

Who remove it from the list? list_top() removes? (impl doesn't seem such deletion).

ubf_wakeup_all_threads();

why wakeup all blocking threads?

rb_timer_create()

There is a guard #if UBF_TIMER == UBF_TIMER_POSIX but not for rb_timer_pthread_create().
I misunderstand that UBF_TIMER_POSIX needs timer pthread.

rb_timer_ prefix

making timer_ will help to understand they are local static functions.
If you want to expose them outside thread_pthread.c, they should be more verbose name like rb_vm_timer and so on to recognize they are no Timer class in Ruby world (it is my opinion).

Updated by normalperson (Eric Wong) over 6 years ago

wrote:

Let me clear your commits.
Maybe "Description" of this ticket is obsolete.

I guess, I will add doc or comments to thread_pthread.c

They are my understandings (fix me if they are wrong):

  • Your idea is using POSIX timer if possible.

Yes, but scope of "timer" here is reduced.

Old timer-thread did several things:

  1. set timer-interrupt (100ms timeslice)
  2. handle ubf list
  3. check for signals

New GVL can handle all of these tasks.

However, application without GVL contention need UBF_TIMER_* to
deal with 2) and 3) reliably (unless [Bug #14968] to make all
pipes/sockets non-blocking is accepted).

#define UBF_TIMER_NONE 0
#define UBF_TIMER_POSIX 1
#define UBF_TIMER_PTHREAD 2

UBF_TIMER_POSIX uses POSIX timer. UBF_TIMER_PTHREAD is traditional timer thread approach.

Not exactly. UBF_TIMER_* does not handle 1) (timer-interrupt)

BTW

 * UBF_TIMER is to close TOCTTOU signal race on programs
 * without GVL contention blocking read/write to sockets.

I can't understand these lines.
What is related between GVL contention and read/write sockets?
Single thread application do not kick gvl_acquire_common and timer_thread_function() is not kicked, right?

Right.

  • GVL acquire loop invoke timer_thread_function()

... sorry I'm giving up to understand them correctly.

I think the confusing thing is UBF_TIMER is NOT for timer-interrupt.

timer_thread_function is only for timer-interrupt, now (not
signals or ubf_list).

Also, timer_thread_pipe needs to be renamed, since it has
nothing to do with timer, now.

Could you explain more for current trunk?

After your commit, there are no timer threads?

Correct (for platforms with POSIX timers, at least).

Comments/Questions

gvl_acquire_common

VM_ASSERT(th->unblock.func == 0 && "we reuse ubf_list for GVL waitq");

I strongly disagree that reusing. It is very confusing.

I don't want to make rb_thread_struct any bigger. I can make
it a union to clarify use.

rb_timer_(|dis)arm()

English question (sorry). What does it mean with word "arm"?

"arm" as in enable the periodic timer, "disarm" stops the timer.

"arm"/"disarm" is used in Linux timer_settime(2) manpage.
Ditto for setitimer(2) manpage (setitimer was used in Ruby 1.8;
but POSIX obsoletes it in favor of timer_settime).

native_cond_timeout()

This function is not good name because I misunderstand it is same as native_cond_timedwait().
Maybe I had named it :p

Yes, I just used an existing function :P

list_add_tail(&vm->gvl.waitq, &nd->ubf_list);

Why add it? Whatvm->gvl.waitq manages?
To specify designate_timer_thread()

Yes, basically we manage the GVL wait-queue ourselves, now,
instead of relying on same linked-list in glibc. This is
similar to my Mutex and autoload locking rewrite; as well
as waitqueue implementation in Linux kernel.

This allows us more fine-grained control and ordering of
GVL waiters.

There are two condvars gvlq and switch_cond. Is it intentional?

Yes, switch_cond is from old GVL by kosaki. It made Thread.pass
much faster. I tried to get rid of switch_cond and
switch_wait_cond, but could not match Thread.pass performance
from old GVL.

Who remove it from the list? list_top() removes? (impl doesn't seem such deletion).

The same thread which calls list_add_tail calls list_del_init
in gvl_acquire_common.

ubf_wakeup_all_threads();

why wakeup all blocking threads?

ubf_wakeup_all_threads always needs to be called periodically
if there's threads in ubf_list. If GVL is contended, it is
easy to do it from the thread which is already handling
timer interrupt.

rb_timer_create()

There is a guard #if UBF_TIMER == UBF_TIMER_POSIX but not for rb_timer_pthread_create().
I misunderstand that UBF_TIMER_POSIX needs timer pthread.

Right, I tried to avoid extra ifdef there since
rb_timer_pthread_create is empty function for UBF_TIMER_POSIX
anyways.

I can add "if (UBF_TIMER == UBF_TIMER_PTHREAD)" conditional in
C (not CPP) if it clarifies things. I prefer to avoid CPP ifdef
guard when possible and efficient to do compiler checks in more
platform-specific code.

rb_timer_ prefix

making timer_ will help to understand they are local static functions.
If you want to expose them outside thread_pthread.c, they should be more verbose name like rb_vm_timer and so on to recognize they are no Timer class in Ruby world (it is my opinion).

OK, I will change to "ubf_timer_" prefix, instead.
Bare "timer_" is confusing with POSIX functions, and
I don't think they need to be exposed outside of
thread_pthread.c

Updated by normalperson (Eric Wong) over 6 years ago

Eric Wong wrote:

wrote:

Let me clear your commits.
Maybe "Description" of this ticket is obsolete.

I guess, I will add doc or comments to thread_pthread.c

I think I addressed this in r64377 and made clarifications
in naming/flow in r64375, r64373, r64372 and r64371.

Thank you for comments and please let us know if you have
any more questions.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0