Bug #11799
closedObject allocation during garbage collection phase terminates the Ruby process
Description
Resurrection of Bug #11549 that was apparently inadvertently deleted.
Also note the similarity to Bug #10868.
Summary: Multi-Threaded Ruby apps are often problematic; especially so when utilizing thread pools, and scheduling work to worker threads.
- While the problem isn't easily recreated, it can be reliably recreated given sufficient time and conditions (for garbage collection in conjunction with sibling thread behavior).
- Initially, three trace files have been attached. Each shows the problem in different conditions, and using different client infrastructure.
- If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid.
RE: attached file ruby_2.2.3_obj_alloc_gc_bug.txt
Following is the section of sap_consumer_control.rb that is presented by the Ruby interpreter as being the current execution context when the problem occurs.
408: loop do
409: @worker_threads.schedule(@work_queue.pop, &@consumer)
410: @sap_packets_consumed += 1
411: endNotes:
- @work_queue is a Ruby Queue (allocated within the main thread) into which a producer thread places work requests.
- @worker_threads is a thread pool (allocated within the main thread).
- @worker_threads schedule method simply puts a work request into the thread pool's internal work queue. One of the worker threads within the thread pool will consume/effect the work request, by executing the specified consumer Proc.
- The main program thread simply loops forever scheduling work to thread pool threads.
- It appears that an object is being allocated by virtue of the @work_queue.pop
RE: attached file ruby_bug_redis_client.txt
- The observed behavior also occurs readily when employing the Redis client gem.
- This is the use case in which the bug is most easily reproduced.
RE: attached file ruby_bug-gdb.txt
- The observed behavior has also been observed when using stretcher, faraday, net-http-persistent gems.
- Connections via stretcher to net-http-persistent are on a per thread basis (maintained within thread local storage).
- In this particular case, the Ruby process was run under gdb, and back traces for each of the process threads are provided.
Files
Updated by charlez (Charles Leu) almost 9 years ago
- File deleted (
ruby_bug-gdb.txt)
Updated by charlez (Charles Leu) almost 9 years ago
- File ruby_bug-gdb.txt ruby_bug-gdb.txt added
Supplied correct edition of file ruby_bug-gdb.txt
Updated by normalperson (Eric Wong) almost 9 years ago
charlez.leu@gmail.com wrote:
- If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid.
Fwiw, I took a quick look at the dmark/dfree callbacks registered with
*Data_(Wrap|Make)_Struct macros in the following non-standard extension
gems but did not notice anything amiss with the dmark/dfree callbacks
that could trigger allocations:
google_hash 0.8.9 (C++)
msgpack 0.7.0 and 0.7.1
xxhash 0.3.0
yajl-ruby 1.2.1
Perhaps I missed something in those or the bug is elsewhere.
Keep in mind I don't know C++, so I'm least confident about google_hash.
Updated by ko1 (Koichi Sasada) almost 9 years ago
Thank you for your log.
This is a reason.
(gdb) thread 12
[Switching to thread 12 (Thread 0x7fffe5fc4700 (LWP 27808))]#0 0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007ffff7b0375b in native_cond_wait (limits_us=<value optimized out>) at thread_pthread.c:338
#2 gvl_acquire_common (limits_us=<value optimized out>) at thread_pthread.c:87
#3 gvl_yield (limits_us=<value optimized out>) at thread_pthread.c:155
#4 rb_thread_schedule_limits (limits_us=<value optimized out>) at thread.c:1209
#5 0x00007ffff7b03b7c in rb_threadptr_execute_interrupts (th=<value optimized out>, blocking_timing=0) at thread.c:2019
#6 0x00007ffff7ae236d in vm_call0_body (th=0x1988bb0, ci=<value optimized out>, argv=0x7fffe5fc10b0) at vm_eval.c:252
#7 0x00007ffff7ae331e in vm_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:59
#8 rb_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:349
#9 rb_call (recv=18167000, mid=152, n=1) at vm_eval.c:616
#10 rb_funcall (recv=18167000, mid=152, n=1) at vm_eval.c:818
#11 0x00007fffef6659ed in eqrb::operator()(unsigned long, unsigned long) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#12 0x00007fffef6f6b6f in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::KeyInfo::equals(unsigned long const&, unsigned long const&) const ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#13 0x00007fffef6f5739 in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::equals(unsigned long const&, unsigned long const&) const ()
---Type <return> to continue, or q <return> to quit---
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#14 0x00007fffef6f65ab in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::test_deleted_key(unsigned long const&) const ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#15 0x00007fffef6f5440 in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::test_deleted(google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > > const&) const ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#16 0x00007fffef6f4a49 in google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::advance_past_deleted() ()
from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#17 0x00007fffef6f45e4 in google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsi---Type <return> to continue, or q <return> to quit---
gned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::operator++() () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#18 0x00007fffef6f38e8 in mark_hash_map_values () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#19 0x00007ffff79c91cb in gc_mark_children (objspace=0x6029f0, obj=18167000) at gc.c:4190
#20 0x00007ffff79ca5f4 in rgengc_rememberset_mark (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>,
immediate_sweep=<value optimized out>, reason=<value optimized out>) at gc.c:5376
#21 gc_marks_start (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
reason=<value optimized out>) at gc.c:4951
#22 gc_marks (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
reason=<value optimized out>) at gc.c:5206
#23 gc_start (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
reason=<value optimized out>) at gc.c:5957
#24 0x00007ffff79cc65c in heap_prepare (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1564
#25 heap_get_freeobj_from_next_freepage (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1576
#26 heap_get_freeobj (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1600
#27 newobj_of (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1645
#28 0x00007ffff796ae69 in ary_alloc (capa=3) at array.c:445
(1) Thread 12 runs GC
(2) mark function in /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so is called. #17 (how to read...?)
(3) #13 calls Ruby code from mark function. <- BUG
(4) Ruby code switches to Thread 6.
(5) Thread 6 runs own Ruby code and cause GC
(6) get [BUG] message
This is 3rd party issue.
Updated by ko1 (Koichi Sasada) almost 9 years ago
https://github.com/rdp/google_hash/blob/master/ext/template/google_hash.cpp.erb#L170
I'm not sure the implementation details, but
for(<%= type %>_hash_map< <%= key_type %>, <%= value_type %> <%= extra_hash_params %> >::iterator it = incoming->hash_map->begin(); it != incoming->hash_map->end(); ++it) {
this iteration calls #== method.
Updated by nobu (Nobuyoshi Nakada) almost 9 years ago
- Status changed from Open to Third Party's Issue
Updated by charlez (Charles Leu) almost 9 years ago
Koichi Sasada wrote:
https://github.com/rdp/google_hash/blob/master/ext/template/google_hash.cpp.erb#L170
I'm not sure the implementation details, but
for(<%= type %>_hash_map< <%= key_type %>, <%= value_type %> <%= extra_hash_params %> >::iterator it = incoming->hash_map->begin(); it != incoming->hash_map->end(); ++it) {
this iteration calls #== method.
Thanks for the prompt attention and feedback. It's greatly appreciated.
google_hash will be removed (unfortunately it was being relied upon heavily), and I'll re-test.
If other gems (e.g. yajl-ruby) are also problematic I'll remove their usage as well.