Feature #17261
closedSoftware transactional memory (STM) for Threads and Ractors
Description
Abstract¶
I propose Software transactional memory (STM) for threads and ractors.
Implementation is here: https://github.com/ruby/ruby/pull/3652
The interface is similar to concurrent-ruby, but not the same.
http://ruby-concurrency.github.io/concurrent-ruby/1.1.4/Concurrent/TVar.html
Basic concept¶
https://en.wikipedia.org/wiki/Software_transactional_memory
Transaction is popular idea on data base systems to keep state consistency.
STM is similar idea to implement optimistic synchronization strategy.
There are several advantages compare with traditional synchronization techniques like Mutex and so on:
- Performance: in some cases, it is faster because of optimistic nature.
- Composability: multiple locks can introduce dead-lock. STM allows nested transaction. In other words, (some kind of) STM can guarantee the progressiveness.
The disadvantages is, it can lead slow down on high-contention cases.
API¶
-
Thread::atomically do expr end
: make a new transaction and runexpr
in it.expr
can be retried if the conflict is detected. Thread::TVar.new(default_value)
-
Thread::TVar#value
: get current value of TVar -
Thread::TVar#value = val
: set TVar valueval
. -
Thread::TVar#increment(n=1)
: Just same asThread.atomically{ tv.value += 1 }
.
Note that expr
for Thread.atomically
can retries and all TVar#value=
(set TVar values) are reverted before retries. Another operations such as other memory modification, IO operations includes network operations etc are not reverted.
The very difference between Concurrent::TVar
is:
- TVar only refer to shareable objects to support Ractor.
-
TVar#value=
should be used withatomically
. We can define asThread.atomically{ tv.value = val }
, but it can lead misusing withoutatomically
. -
TVar#increment
is special case to allow setting withoutatomically
to support typical single counter cases.
Implementation¶
https://github.com/ruby/ruby/pull/3652
The implementation is almost same as TL2, lock-based STM with global version clock with pthread/win32 threads.
We can use atomic operations but not supported yet (but only a few performance benefit on my measuremnets).
Example¶
N = 1_000_000
tv1 = Thread::TVar.new(0)
tv2 = Thread::TVar.new(0)
r1 = Ractor.new tv1, tv2 do |tv1, tv2|
loop do
Thread.atomically do
v1, v2 = tv1.value, tv2.value
raise if v1 != v2
end
end
end
rs = 3.times.map do
Ractor.new tv1, tv2 do |tv1, tv2|
N.times do
Thread.atomically do
tv1.value += 1
tv2.value += 1
end
end
end
end
rs.each{|r| r.take}
p [tv1.value, tv2.value] #=> [3000000, 3000000]
In this case,
- all
atomically
blocks keep consistency thattv1.value == tv2.value
. - the results
[3000000, 3000000]
shows consistency on+=1
.
Here is famous bank-account example:
class Account
COUNT = Thread::TVar.new 0
def initialize deposit = 0
@i = COUNT.increment
@balance = Thread::TVar.new(deposit)
end
def transfer_from acc, n
Thread::atomically do
acc.withdraw n
self.deposit n
end
end
def transfer_to acc, n
Thread::atomically do
self.withdraw n
acc.deposit n
end
end
def withdraw n
@balance.value -= n
end
def deposit n
@balance.value += n
end
def balance
@balance.value
end
end
AN = 1_0000
N = 10_000_000
RN = 10
iter = 0
accs = AN.times.map{Account.new.freeze}.freeze
require 'benchmark'
# :forward
# two ractors operate N times: a[i].transfer(a[i+1])
# R1: a1->a2, a2->a3, ...
# R2: a1->a2, a2->a3, ...
# :reverse
# two ractors operate N times: a[i].transfer(a[i+1]),
# but the oroder of accounts are reversed.
# R1: a1->a2, a2->a3, ...
# R2: a1->aN-1, a2->aN-2, ...
# :shuffle
# RN ractors operate N times: a[rand].transfer(a[rand])
# It simulates normal bank-operation
mode = :shuffle
loop do
iter += 1
btime = Time.now
case mode
when :forward
rs = []
rs << Ractor.new(accs) do |accs|
N.times{|i|
a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size]
a1.transfer_to(a2, 1)
}
end
rs << Ractor.new(accs) do |accs|
N.times{|i|
a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size]
a1.transfer_from(a2, 1)
}
end
rs.each{|r| r.take}
when :reverse
rs = []
rs << Ractor.new(accs) do |accs|
N.times{|i|
a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size]
a1.transfer_to(a2, 1)
}
end
rs << Ractor.new(accs.reverse.freeze) do |accs|
N.times{|i|
a1, a2 = accs[i%accs.size], accs[(i+1)%accs.size]
a1.transfer_from(a2, 1)
}
end
rs.each{|r| r.take}
when :shuffle
RN.times.map{
Ractor.new(accs) do |accs|
rnd = Random.new
N.times{
a1 = accs.sample random: rnd
a2 = accs.sample random: rnd
redo if a1 == a2
a1.transfer_to(a2, rnd.rand(1000))
}
end
}.each{|r| r.take}
else
raise
end
sum = accs.inject(0){|r, acc| acc.balance + r}
if sum != 0
pp accs
raise "iter #{iter} sum:#{sum}"
end
etime = Time.now
p time: etime - btime
# break
end
This program create AN bank accounts and repeat N transafer operations.
You can observe that huge AN reduces conflicts and the execution time is low. Small AN reduces conflicts -> many retries and the execution time is high.
AN Execution time (s) Retry counts
100 6.914 958,969
1_000 3.107 186,267
10_000 2.549 26,183
100_000 2.627 2,458
Now x10 retries doesn't affect execution time x10, this is because the current Ractor implementation (acquiring a global lock to raise an exception, and it reduces the retry counts). If we improve the Ractor's implementation, the result would be more worse.
Consideration¶
Thread.atomically
in ractors
At first, I implemented this feature with Ractor::atomically
and Ractor::TVar
.
However, this STM feature will help the thread programming.
This is why I moved from Ractor::atomically
to Thread::atomically
.
Introduce Concurrent
namespace what concurrent-ruby are using. However, there are small differences so that I'm not sure is is feasible.
Another idea is to support alias: Thread.atomically
and Ractor.atomically
.
Thread::TVar
can refer only shareable objects
Threads can access all objects so we don't need to restrict by such rule.
However, to support ractors, this restriction is needed.
One idea is separate Thread::TVar
and Ractor::TVar
, but it can introduce confusion.
Only with shareable objects, thread programs become more thread-safe, so I think it is good choice to have current restriction.
Bug detection¶
Similar to locking, we can forget to use a atomically
like that:
class C
def initialize
@tv1 = Thread::TVar.new(0)
@tv2 = Thread::TVar.new(0)
end
def tv1() = @tv1.value
def tv2() = @tv2.value
def tv1 = (v)
Thread.atomically{ @tv1.value = v }
end
def tv2 = (v)
Thread.atomically{ @tv2.value = v }
end
end
obj = C.new
obj.tv1 += 1
obj.tv2 += 2
It works but it can introduce inconsistency if tv1 and tv2 are tightly coupled with because tv1 and tv2 are not accessed in the same transaction.
If tv1 and tv2 need to be modified consistently, we need to write like the following:
Thread.atomically do
obj.tv1 += 1
obj.tv2 += 1
end
and tv1/tv2/tv1=/tv2=
methods should not be defined.
I mean we can write bad programs easily.
It is same situation with traditional locking (we need to use Mutex
appropriately). The duty to use it correctly is for programmer.
There are some advantages compared with traditional locking:
- We can concentrate on TVars. On traditional thread programming we need to check all memory state.
- We can introduce logging mechanism and we can find wrong usage (for example: tv1 and tv2 are set within independent transactions). I think we can make some checker based on the log. On traditional thread programming, there are several similar works, but it is difficult to check it because the target of state is most of memory operations.
Related works¶
- There are many STM implementation techniques. https://www.morganclaypool.com/doi/abs/10.2200/S00070ED1V01Y200611CAC002
- Concurrent Haskell and Clojure are famous to support STM in language (I think).
- The model of STM is similar to Clojure.
- Clojure allows to access TVar (
ref
in Clojure) value withoutatomically
(dosync
in Clojure). - Clojure doesn't allow to set TVar value without
atomically
.
- Clojure allows to access TVar (
- The API is similar to Concurrent Haskell (
TVar
andatomically
.
- The model of STM is similar to Clojure.
- Concurrent-ruby has
Concurrent::TVar
.- But it allows to have an unshareable object.
- But is allows to set the value with
atomically
.