Bug #12699
closedCrash in the VM - maybe garbage collector bug
Description
Basically, we were investigating this: https://github.com/grpc/grpc/issues/7661
Our investigation led to realize that this assert in the protobuf code is being triggered, but only if the garbage collector has been exercised enough: https://github.com/google/protobuf/blob/master/ruby/ext/google/protobuf_c/map.c#L74
If the garbage collector is really under heavy stress, we can even produce a VM crash: http://pastebin.com/hzgHPJGq
I have included a zip file with our current reproduction case. Right now, this can crash any of the versions of Ruby I've been able to try this with. The reproduction steps are as follow:
$ bundle install $ bundle exec gem repro.rb
The idea of the repro is to load a baked binary protobuf from the disk, and deserialize it enough times in memory to eventually cause a failure. The failure is evidently due to some corruption that happens in the Ruby VM. We have checked that the actual raw memory itself hasn't been altered - and even though it would've been, the internal assert being triggered shouldn't have happened in the first place.
When using a vanilla version of Ruby, the crash will not be deterministic. However, compiling a custom Ruby library with the timer_thread disabled causes the crash to become fully deterministic. Changing the value of the number of times we try to deserialize the object while the garbage collector is disabled will alter the behavior of the problem.
It would also be reasonable to suspect that the protobuf C extension is using the Ruby C API in a way that causes the VM's memory to eventually go corrupt, but we haven't found anything in the code that would be suspicious, and it is in fact a pretty standard key/value operation that's happening there. But I will be cross-filing a similar bug report on the google-protobuf project anyway.
Files