Bug #19470
openFrequent small range-reads from and then writes to a large array are very slow
Description
Write to a large array gets very slow when done after range-reading more than 3 items. In such case the original array gets marked as shared which triggers CoW on a small change afterwards. This leads to a significant performance impact and high memory utilization in cases when we need to range-read/write from/to the same array many times. While this issue can be avoided by reading <= 3 elements at a time the main problem is that this behaviour is not obvious and hard to catch on on-trivial projects.
times = []
arr = [0] * 100000
times.push 0
100000.times do
time_start = Time.now
arr[5] = 100 # takes 0.01662315899999512
times[-1] += Time.now - time_start
end
times.push 0
100000.times do
arr[0..2]
time_start = Time.now
arr[5] = 100 # takes 0.01826406799999659
times[-1] += Time.now - time_start
end
times.push 0
100000.times do
arr[0..3]
time_start = Time.now
arr[5] = 100 # takes 7.757753919000069
times[-1] += Time.now - time_start
end
times.push 0
100000.times do
arr.dup
time_start = Time.now
arr[5] = 100 # takes 7.626929300999957
times[-1] += Time.now - time_start
end
times.push 0
100000.times do
arr.clone
time_start = Time.now
arr[5] = 100 # takes 8.216933763000046
times[-1] += Time.now - time_start
end
p times
Updated by mame (Yusuke Endoh) over 1 year ago
We the core team know the potential problem with the memory sharing technique. We could be wrong, but we currently believe its advantage (performance improvement in many typical and major cases) is larger enough than its disadvantage (potential performance degeneration in some rare cases).
The question is whether the potential problem is really rare. Do you have any evidence that this problem actually occurs frequently?
Updated by ioquatix (Samuel Williams) over 1 year ago
It's interesting to me that reading items can mark the array as shared, that seems non-trivial/non-obvious behaviour to me. What's the rational for it?
Updated by ioquatix (Samuel Williams) over 1 year ago
Sorry, I read the example more closely, it seems we are taking a small slice, which marks the original array as shared. So both the original array and the slice point at the same memory allocation, and on mutation, a copy is made. I think @mame (Yusuke Endoh) is right, in general this is a good optimisation. But I see the problem, there is a degenerate case here.
I wonder if we can propose some way to avoid this degenerate case if the mutation is frequent or the slice should definitely be a copy rather than invoke the CoW. Maybe a percentage based approach, or maybe we can detect the CoW frequently triggering a full copy = don't do further CoW.
Updated by Hanmac (Hans Mackowiak) over 1 year ago
i think maybe the GC could optimize something?
100000.times do
arr[0..3] #=> GC notices that the result is unused?
GC.stress # would the GC delete the shared copy?
time_start = Time.now
arr[5] = 100 # takes ?
times[-1] += Time.now - time_start
end
if the GC could recognize that the slice of the array isn't assigned somewhere and would remove it
is the structure of the array object smart enough that there isn't any shared pieces anymore?
or would we need a shared count for this instead?
if the main array can notice that it isn't shared anymore because all the ones that shared it are gone, would it be smart enough to not make a copy?
Updated by giner (Stanislav German-Evtushenko) over 1 year ago
Do you have any evidence that this problem actually occurs frequently?
I cannot say whether this affects many projects. I've encountered this twice so far when optimizing code to reduce memory allocations (by writing to the same array instead of allocating new ones). In both cases it took quite a while to figure out why instead of being faster and using less memory it did the opposite, i.e. became slower and used more memory.
Updated by headius (Charles Nutter) over 1 year ago
This looks like a pretty typical effect of slicing, so I'm not sure there's something to "fix" here. JRuby has similar behavior.
Improving the sharing heuristic might help, but it's hard to know where to draw the line; should a slice of 5 elements trigger copy? Ten elements? Perhaps only share when slice size is greater than some percentage of the total size?
Perhaps we should make COW less hidden? Add something like Array#fresh_slice
to opt into non-COW slicing when you know you're still mutating the original?
Updated by giner (Stanislav German-Evtushenko) over 1 year ago
Perhaps we should make COW less hidden? Add something like Array#fresh_slice to opt into non-COW slicing when you know you're still mutating the original?
Sounds like a good compromise