Project

General

Profile

Actions

Feature #13517

closed

[PATCH] reduce rb_mutex_t size from 160 to 80 bytes on 64-bit

Added by normalperson (Eric Wong) over 7 years ago. Updated over 7 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:80913]

Description

Instead of relying on a native condition variable and mutex for
every Ruby Mutex object, use a doubly linked-list to implement a
waiter queue in the Mutex.  The immediate benefit of this is
reducing the size of every Mutex object, as some projects have
many objects requiring synchronization.

In the future, this technique using a linked-list and on-stack
list node (struct mutex_waiter) should allow us to easily
transition to M:N threading model, as we can avoid the native
thread dependency to implement Mutex.

We already do something similar for autoload in variable.c,
and this was inspired by the Linux kernel wait queue (as
ccan/list is inspired by the Linux kernel linked-list).

Finaly, there are big performance improvements for Mutex
benchmarks, especially in contended cases:

measure target: real

name            |trunk  |built
----------------|------:|------:
loop_whileloop2 |  0.149|  0.148
vm2_mutex*      |  0.893|  0.651
vm_thread_mutex1|  0.809|  0.624
vm_thread_mutex2|  2.608|  0.628
vm_thread_mutex3| 28.227|  0.881

Speedup ratio: compare with the result of `trunk' (greater is better)

name            |built
----------------|------:
loop_whileloop2 |  1.002
vm2_mutex*      |  1.372
vm_thread_mutex1|  1.297
vm_thread_mutex2|  4.149
vm_thread_mutex3| 32.044

Tested on AMD FX-8320 8-core at 3.5GHz

Files

Updated by normalperson (Eric Wong) over 7 years ago

wrote:

https://bugs.ruby-lang.org/issues/13517

For who care about 32-bit, single-core x86, here are my
Pentium M (Centrino) @ 1.6GHz numbers:

Size reduction of Mutex on 32-bit is 112 => 40 bytes

minimum results in each 3 measurements.
Execution time (sec)
name trunk built
loop_whileloop2 0.554 0.554
vm2_mutex* 3.136 2.217
vm_thread_mutex1 2.783 2.186
vm_thread_mutex2 2.907 2.174
vm_thread_mutex3 9.740 2.586

Speedup ratio: compare with the result of `trunk' (greater is better)
name built
loop_whileloop2 0.999
vm2_mutex* 1.414
vm_thread_mutex1 1.273
vm_thread_mutex2 1.337
vm_thread_mutex3 3.766

In the future, I think the cond_waiting flag can be moved into
a FL_USER flag, too.

But I also want to try similar changes to avoid Array usage in
Queue, SizedQueue, and ConditionVariable classes and rely on
ccan/list + stack for waiters. I will convert from T_STRUCT to
T_DATA.

Updated by normalperson (Eric Wong) over 7 years ago

wrote:

Feature #13517: [PATCH] reduce rb_mutex_t size from 160 to 80 bytes on 64-bit
https://bugs.ruby-lang.org/issues/13517

Any comment? I would like to commit this, soon.

Thanks.

Updated by ko1 (Koichi Sasada) over 7 years ago

At a glance, it seems nice.
But I need to time to check deeply.
I'll check with 'Misc #13514'.

Please wait these days. In Japan, now we have holiday week. I'll check on these days.

Thanks,
Koichi

Updated by ko1 (Koichi Sasada) over 7 years ago

sorry for late response.
I have no objection about this patch. thank you.

one question.

	list_for_each_safe(&mutex->waitq, cur, next, node) {
	    list_del_init(&cur->node);
	    switch (cur->th->state) {
	    case THREAD_KILLED:
		continue;
	    case THREAD_STOPPED:
	    case THREAD_RUNNABLE:
	    case THREAD_STOPPED_FOREVER:
		rb_threadptr_interrupt(cur->th);
		goto found;
	    }
	}

rb_mutex_lock() set th->status as THREAD_STOPPED_FOREVER before
native sleep, but the above code quoted from rb_mutex_unlock_th().

What kind of situation do you assume when the thread status is other
than THREAD_STOPPED_FOREVER?

Thanks,
Koichi

--
// SASADA Koichi at atdot dot net

Updated by normalperson (Eric Wong) over 7 years ago

SASADA Koichi wrote:

sorry for late response.
I have no objection about this patch. thank you.

one question.

	list_for_each_safe(&mutex->waitq, cur, next, node) {
	    list_del_init(&cur->node);
	    switch (cur->th->state) {

Oops, that should be status, not state:

    switch (cur->th->status) {
 case THREAD_KILLED:
 continue;
 case THREAD_STOPPED:
 case THREAD_RUNNABLE:
 case THREAD_STOPPED_FOREVER:
 rb_threadptr_interrupt(cur->th);
 goto found;
 }

}

`rb_mutex_lock()` set `th->status` as `THREAD_STOPPED_FOREVER` before
native sleep, but the above code quoted from `rb_mutex_unlock_th()`.

What kind of situation do you assume when the thread status is other
than `THREAD_STOPPED_FOREVER`?

Back to your original question. THREAD_RUNNABLE is possible
if somebody uses Thread#run:

require 'thread'
m = Mutex.new
th = Thread.new do
sleep 0.1 # wait for main thread to get lock
m.synchronize do
sleep
end
end

m.synchronize do
sleep 0.2 # wait for th to block on m.synchronize
th.run
end

I am not sure about other statuses. Maybe exit/GC can trigger
THREAD_KILLED, the mutex_free->rb_mutex_unlock_th call chain
looks like it might due to GC ordering. Anyways, I will add
comments here when I commit.

Thanks,
Koichi

Thank you for the review!

Updated by ko1 (Koichi Sasada) over 7 years ago

On 2017/05/08 8:08, Eric Wong wrote:

Back to your original question. THREAD_RUNNABLE is possible
if somebody uses Thread#run:

require 'thread'
m = Mutex.new
th = Thread.new do
  sleep 0.1 # wait for main thread to get lock
  m.synchronize do

sleep
end
end

m.synchronize do
  sleep 0.2 # wait for th to block on m.synchronize
  th.run
end

I also confirm that this code set THREAD_RUNNABLE. However, th waits
locking forever, current Thread#run should be bug. mmmm. But not so
serious because it is only small period (maybe as you know). We should
modify later.

I am not sure about other statuses. Maybe exit/GC can trigger
THREAD_KILLED, the mutex_free->rb_mutex_unlock_th call chain
looks like it might due to GC ordering. Anyways, I will add
comments here when I commit.

I think adding rb_bug() guard is good to know the flow of such situation.

--
// SASADA Koichi at atdot dot net

Actions #7

Updated by Anonymous over 7 years ago

  • Status changed from Open to Closed

Applied in changeset trunk|r58604.


reduce rb_mutex_t size from 160 to 80 bytes on 64-bit

Instead of relying on a native condition variable and mutex for
every Ruby Mutex object, use a doubly linked-list to implement a
waiter queue in the Mutex. The immediate benefit of this is
reducing the size of every Mutex object, as some projects have
many objects requiring synchronization.

In the future, this technique using a linked-list and on-stack
list node (struct mutex_waiter) should allow us to easily
transition to M:N threading model, as we can avoid the native
thread dependency to implement Mutex.

We already do something similar for autoload in variable.c,
and this was inspired by the Linux kernel wait queue (as
ccan/list is inspired by the Linux kernel linked-list).

Finaly, there are big performance improvements for Mutex
benchmarks, especially in contended cases:

measure target: real

name trunk built
loop_whileloop2 0.149 0.148
vm2_mutex* 0.893 0.651
vm_thread_mutex1 0.809 0.624
vm_thread_mutex2 2.608 0.628
vm_thread_mutex3 28.227 0.881

Speedup ratio: compare with the result of `trunk' (greater is better)

name built
loop_whileloop2 1.002
vm2_mutex* 1.372
vm_thread_mutex1 1.297
vm_thread_mutex2 4.149
vm_thread_mutex3 32.044

Tested on AMD FX-8320 8-core at 3.5GHz

  • thread_sync.c (struct mutex_waiter): new on-stack struct
    (struct rb_mutex_struct): remove native lock/cond, use ccan/list
    (rb_mutex_num_waiting): new function for debug_deadlock_check
    (mutex_free): remove native_destroy
    (mutex_alloc): initialize waitq, remove native
    initialize
    (rb_mutex_trylock): remove native_mutex
    {lock,unlock}
    (lock_func): remove
    (lock_interrupt): remove
    (rb_mutex_lock): rewrite waiting path to use native_sleep + ccan/list
    (rb_mutex_unlock_th): rewrite to wake up from native_sleep
    using rb_threadptr_interrupt
    (rb_mutex_abandon_all): empty waitq
  • thread.c (debug_deadlock_check): update for new struct
    (rb_check_deadlock): ditto
    [ruby-core:80913] [Feature #13517]
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0