Bug #1535
closedHash#merge! Inside Iterator Can Cause RuntimeError
Description
=begin
While iterating over a hash I can cause a RuntimeError to be raised when calling Hash#merge! on the same hash under very specific circumstances. Specifically, before the merge! the size of the two hashes combined needs to be about 67 or 68. Much more or less and the code runs correctly. For example:
hash = {1 => 2, 3 => 4, 5 => 6}
big_hash = {}
64.times { |k| big_hash[k.to_s] = k }
hash.each { hash.merge!(big_hash) }
This raises a RuntimeError: "hash modified during iteration" on 1.8.6.368 and 1.8.7.72. It runs correctly on 1.9.1.129. I've tested on a 32-bit Linux box, but it has been confirmed on a Mac, too.
I believe this to be a bug for three principle reasons:
1 - This behaviour is not documented for #each or #merge!. Something similar is documented for #rehash, but as the user does not explicitly call #rehash, they cannot be held responsible for its side effects.
2 - It seemingly operates under very specific conditions such that it may only be visible given such data structures, so the user would likely be unaware of this fatal consequence prior to experiencing it.
3 - It does not occur under 1.9.1, implying that this isn't the desired behaviour.
The ideal resolution is that this code works correctly, regardless of the dimensions of the data structures. If that is not possible, calling Hash#merge inside such an iterator block should be prohibited, so as to notify users of the incompatibility at design time.
=end
Updated by shyouhei (Shyouhei Urabe) over 15 years ago
=begin
(1) Hash data structure changed between 1.8 and 1.9. It is difficult (and too drastic I think) to backport that.
(2) I believe that calling Hash#merge inside a Hash#each block is already prohibited in 1.8. A detailed info on your "very specific conditions" can help us a lot.
=end
Updated by runpaint (Run Paint Run Run) over 15 years ago
=begin
I believe that calling Hash#merge inside a Hash#each block is already prohibited in 1.8.
I'm afraid I do not understand. It is not prohibited in the sense it is made impossible, because the following code works as one would expect on 1.8.7:
>> x={:glark => :quark}
=> {:glark=>:quark}
>> h = {:foo=>:bar}
=> {:foo=>:bar}
>> h.each { h = h.merge(x) }
=> {:foo=>:bar}
>> p h
{:foo=>:bar, :glark=>:quark}
=> nil
Or:
>> h = {:foo=>:bar}
=> {:foo=>:bar}
>> x={:glark => :quark}
=> {:glark=>:quark}
>> h.each { h.merge!(x) }
=> {:foo=>:bar, :glark=>:quark}
It is not prohibited in the sense that it is described as inadvisable in the documentation, either.
A detailed info on your "very specific conditions" can help us a lot.
The information given in the original bug report was really all I could ascertain. I tried running under strace and a debugger, but couldn't crystallize the problem further. Maybe some worked examples under 1.8.7 IRB will illuminate the problem.
Using the example given in my original report:
hash = {1 => 2, 3 => 4, 5 => 6}
=> {5=>6, 1=>2, 3=>4}
big_hash = {}
=> {}
64.times { |k| big_hash[k.to_s] = k }
=> 64
hash.each { hash.merge!(big_hash) }
RuntimeError: hash modified during iteration
from (irb):4:in `each'
from (irb):4
So, with those specific parameters, calling #merge! from with #each raises a RuntimeError. If this usage really was prohibited, that's what you would expect. However, if we modify the example so hash only has 4 elements the code runs as follows:
hash = {1 => 2, 3 => 4}
=> {1=>2, 3=>4}
big_hash = {}
=> {}
64.times { |k| big_hash[k.to_s] = k }
=> 64
hash.each { hash.merge!(big_hash) }
=> {"6"=>6, "11"=>11, "22"=>22, "33"=>33, "44"=>44, "55"=>55, "7"=>7, "12"=>12, "23"=>23, "34"=>34, "45"=>45, "56"=>56, "8"=>8, "13"=>13, "24"=>24, "35"=>35, "46"=>46, "57"=>57, "9"=>9, "14"=>14, "25"=>25, "36"=>36, "47"=>47, "58"=>58, 1=>2, "15"=>15, "26"=>26, "37"=>37, "48"=>48, "59"=>59, "60"=>60, "0"=>0, "16"=>16, "27"=>27, "38"=>38, "49"=>49, "50"=>50, "61"=>61, "1"=>1, "17"=>17, "28"=>28, "39"=>39, "40"=>40, "51"=>51, "62"=>62, "2"=>2, "18"=>18, "29"=>29, "30"=>30, "41"=>41, "52"=>52, "63"=>63, 3=>4, "3"=>3, "19"=>19, "20"=>20, "31"=>31, "42"=>42, "53"=>53, "4"=>4, "10"=>10, "21"=>21, "32"=>32, "43"=>43, "54"=>54, "5"=>5}
Here, no RuntimeError was raised despite the only difference between the two examples being the size of hash.
Similarly, observe the result if we modify the original example by populating big_hash with only 63 entries, keeping everything else constant:
hash = {1 => 2, 3 => 4, 5 => 6}
=> {5=>6, 1=>2, 3=>4}
big_hash = {}
=> {}
63.times { |k| big_hash[k.to_s] = k }
=> 63
hash.each { hash.merge!(big_hash) }
=> {"6"=>6, "11"=>11, "22"=>22, "33"=>33, "44"=>44, "55"=>55, 5=>6, "7"=>7, "12"=>12, "23"=>23, "34"=>34, "45"=>45, "56"=>56, "8"=>8, "13"=>13, "24"=>24, "35"=>35, "46"=>46, "57"=>57, "9"=>9, "14"=>14, "25"=>25, "36"=>36, "47"=>47, "58"=>58, 1=>2, "15"=>15, "26"=>26, "37"=>37, "48"=>48, "59"=>59, "60"=>60, "0"=>0, "16"=>16, "27"=>27, "38"=>38, "49"=>49, "50"=>50, "61"=>61, "1"=>1, "17"=>17, "28"=>28, "39"=>39, "40"=>40, "51"=>51, "62"=>62, "2"=>2, "18"=>18, "29"=>29, "30"=>30, "41"=>41, "52"=>52, 3=>4, "3"=>3, "19"=>19, "20"=>20, "31"=>31, "42"=>42, "53"=>53, "4"=>4, "10"=>10, "21"=>21, "32"=>32, "43"=>43, "54"=>54, "5"=>5}
Again, the code behaves as we expect. However, if we adjust the above example so that big_hash has 65 entries, the RuntimError is raised again:
hash = {1 => 2, 3 => 4, 5=>6}
=> {5=>6, 1=>2, 3=>4}
big_hash = {}
=> {}
65.times { |k| big_hash[k.to_s] = k }
=> 65
hash.each { hash.merge!(big_hash) }
RuntimeError: hash modified during iteration
from (irb):38:in `each'
from (irb):38
from :0
This is what I meant by "very specific conditions". I have no further insight or explanation into their cause. :-(
The bug, then, is that the RuntimeError is unexpected. Code that worked correctly one day, would start crashing another just because the dimensions of the data set changed. If Hash#merge! is generally dangerous inside iterator blocks, its usage should be prohibited consistently such that all of the above examples raise a RuntimeError, and the documentation must make this limitation clear. This is what happens when Hash#rehash is called inside of an iterator block: any and all usages of this combination raise a RuntimeError.
Thank you for looking into this, Shyouhei, I'm sorry I can't be of any more assistance.
=end
Updated by shyouhei (Shyouhei Urabe) over 15 years ago
- Assignee set to shyouhei (Shyouhei Urabe)
=begin
Thank you. I'll take a closer look at the source code.
=end
Updated by marcandre (Marc-Andre Lafortune) about 15 years ago
- Priority changed from Normal to 3
- Target version changed from Ruby 1.8.7 to Ruby 1.8.8
=begin
=end
Updated by mame (Yusuke Endoh) almost 15 years ago
=begin
Hi,
hash = {1 => 2, 3 => 4, 5 => 6}
big_hash = {}
64.times { |k| big_hash[k.to_s] = k }
hash.each { hash.merge!(big_hash) }This raises a RuntimeError: "hash modified during iteration" on 1.8.6.368 and 1.8.7.72. It runs correctly on 1.9.1.129.
It raises a RuntimeError on trunk. I guess it is by accident
for the exception not to occur on 1.9.1.
By hashtable's nature, adding new keys to hash may cause rehash
automatically, and the automatic rehash may cause the exception
during iteration.
For compatibility reason, we cannot prohibit hash modification
during iteration because there are many programs that do so,
(e.g., rbconfig.rb), like this:
hash.each {|k, v| hash[k] = func(v) }
But I agree with Run Paint Run Run's opinion. It may lead to
difficult bug to indeterminately fail to add a new key.
So, I propose to permit only updating value of existing key,
and to always prohibit adding a new key:
hash = { 1=>2, 3=>4, 5=>6 }
hash.each {|k, v| hash[k] = func(v) } #=> OK
hash.each {|k, v| hash[k.to_s] = v } #=> always exception
This does not cause compatibility problem because this just
raises exception that has already been occurred indeterminately.
I'll commit the following patch to trunk unless anyone says an
objection.
diff --git a/hash.c b/hash.c
index d49d0ea..51537e9 100644
--- a/hash.c
+++ b/hash.c
@@ -270,6 +270,14 @@ rb_hash_modify(VALUE hash)
}
static void
+hash_update(VALUE hash, VALUE key)
+{
- if (RHASH(hash)->iter_lev > 0 && !st_lookup(RHASH(hash)->ntbl, key, 0)) {
- rb_raise(rb_eRuntimeError, "can't add a new key into hash during iteration");
- }
+}
+static void
default_proc_arity_check(VALUE proc)
{
int n = rb_proc_arity(proc);
@@ -1036,6 +1044,7 @@ VALUE
rb_hash_aset(VALUE hash, VALUE key, VALUE val)
{
rb_hash_modify(hash);
- hash_update(hash, key);
if (hash == key) {
rb_raise(rb_eArgError, "recursive key for hash");
}
@@ -1630,6 +1639,7 @@ static int
rb_hash_update_i(VALUE key, VALUE value, VALUE hash)
{
if (key == Qundef) return ST_CONTINUE; - hash_update(hash, key);
st_insert(RHASH(hash)->ntbl, key, value);
return ST_CONTINUE;
}
@@ -1641,6 +1651,7 @@ rb_hash_update_block_i(VALUE key, VALUE value, VALUE hash)
if (rb_hash_has_key(hash, key)) {
value = rb_yield_values(3, key, rb_hash_aref(hash, key), value);
} - hash_update(hash, key);
st_insert(RHASH(hash)->ntbl, key, value);
return ST_CONTINUE;
}
--
Yusuke Endoh mame@tsg.ne.jp
=end
Updated by matz (Yukihiro Matsumoto) almost 15 years ago
=begin
Hi,
In message "Re: [ruby-core:28189] [Bug #1535] Hash#merge! Inside Iterator Can Cause RuntimeError"
on Tue, 16 Feb 2010 22:13:51 +0900, Yusuke Endoh redmine@ruby-lang.org writes:
|So, I propose to permit only updating value of existing key,
|and to always prohibit adding a new key:
|
| hash = { 1=>2, 3=>4, 5=>6 }
| hash.each {|k, v| hash[k] = func(v) } #=> OK
| hash.each {|k, v| hash[k.to_s] = v } #=> always exception
|
|This does not cause compatibility problem because this just
|raises exception that has already been occurred indeterminately.
|I'll commit the following patch to trunk unless anyone says an
|objection.
I think it's a good idea. Go ahead.
matz.
=end
Updated by mame (Yusuke Endoh) almost 15 years ago
=begin
Hi,
2010/2/16 Yukihiro Matsumoto matz@ruby-lang.org:
|This does not cause compatibility problem because this just
|raises exception that has already been occurred indeterminately.
|I'll commit the following patch to trunk unless anyone says an
|objection.I think it's a good idea. Go ahead.
Done.
Finally, rubyspec on trunk reports just one error now.
IO#reopen reassociates self with a new stream after some reads FAILED
Expected "Line 3: Three\n"
to equal "Line 1: One\n"
/home/mame/work/ruby/spec/rubyspec/core/io/reopen_spec.rb:125:in
block (2 levels) in <top (required)>' /home/mame/work/ruby/spec/rubyspec/core/io/reopen_spec.rb:4:in
<top
(required)>'
Finished in 115.282141 seconds
2884 files, 13880 examples, 170803 expectations, 1 failure, 0 errors
--
Yusuke ENDOH mame@tsg.ne.jp
=end
Updated by shyouhei (Shyouhei Urabe) about 14 years ago
- Status changed from Open to Closed
=begin
=end