Project

General

Profile

Actions

Bug #11799

closed

Object allocation during garbage collection phase terminates the Ruby process

Added by charlez (Charles Leu) over 8 years ago. Updated over 8 years ago.

Status:
Third Party's Issue
Assignee:
-
Target version:
-
ruby -v:
ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-linux]
[ruby-core:72009]

Description

Resurrection of Bug #11549 that was apparently inadvertently deleted.
Also note the similarity to Bug #10868.

Summary: Multi-Threaded Ruby apps are often problematic; especially so when utilizing thread pools, and scheduling work to worker threads.

  • While the problem isn't easily recreated, it can be reliably recreated given sufficient time and conditions (for garbage collection in conjunction with sibling thread behavior).
  • Initially, three trace files have been attached. Each shows the problem in different conditions, and using different client infrastructure.
  • If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid.

RE: attached file ruby_2.2.3_obj_alloc_gc_bug.txt

Following is the section of sap_consumer_control.rb that is presented by the Ruby interpreter as being the current execution context when the problem occurs.

408: loop do
409: @worker_threads.schedule(@work_queue.pop, &@consumer)
410: @sap_packets_consumed += 1
411: end

Notes:

  • @work_queue is a Ruby Queue (allocated within the main thread) into which a producer thread places work requests.
  • @worker_threads is a thread pool (allocated within the main thread).
  • @worker_threads schedule method simply puts a work request into the thread pool's internal work queue. One of the worker threads within the thread pool will consume/effect the work request, by executing the specified consumer Proc.
  • The main program thread simply loops forever scheduling work to thread pool threads.
  • It appears that an object is being allocated by virtue of the @work_queue.pop

RE: attached file ruby_bug_redis_client.txt

  • The observed behavior also occurs readily when employing the Redis client gem.
  • This is the use case in which the bug is most easily reproduced.

RE: attached file ruby_bug-gdb.txt

  • The observed behavior has also been observed when using stretcher, faraday, net-http-persistent gems.
  • Connections via stretcher to net-http-persistent are on a per thread basis (maintained within thread local storage).
  • In this particular case, the Ruby process was run under gdb, and back traces for each of the process threads are provided.

Files

ruby_bug_redis_client.txt (53.3 KB) ruby_bug_redis_client.txt charlez (Charles Leu), 12/09/2015 09:37 PM
ruby_2.2.3_obj_alloc_gc_bug.txt (49.1 KB) ruby_2.2.3_obj_alloc_gc_bug.txt charlez (Charles Leu), 12/09/2015 09:43 PM
ruby_bug-gdb.txt (151 KB) ruby_bug-gdb.txt charlez (Charles Leu), 12/09/2015 10:13 PM
Actions #1

Updated by charlez (Charles Leu) over 8 years ago

  • File deleted (ruby_bug-gdb.txt)

Updated by charlez (Charles Leu) over 8 years ago

Supplied correct edition of file ruby_bug-gdb.txt

Updated by normalperson (Eric Wong) over 8 years ago

wrote:

  • If this is a problem due to usage of certain gems that aren't 'well-behaved', it would be good to know which ones to avoid.

Fwiw, I took a quick look at the dmark/dfree callbacks registered with
*Data_(Wrap|Make)_Struct macros in the following non-standard extension
gems but did not notice anything amiss with the dmark/dfree callbacks
that could trigger allocations:

google_hash 0.8.9 (C++)
msgpack 0.7.0 and 0.7.1
xxhash 0.3.0
yajl-ruby 1.2.1

Perhaps I missed something in those or the bug is elsewhere.
Keep in mind I don't know C++, so I'm least confident about google_hash.

Updated by ko1 (Koichi Sasada) over 8 years ago

Thank you for your log.

This is a reason.

(gdb) thread 12
[Switching to thread 12 (Thread 0x7fffe5fc4700 (LWP 27808))]#0  0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff772463c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff7b0375b in native_cond_wait (limits_us=<value optimized out>) at thread_pthread.c:338
#2  gvl_acquire_common (limits_us=<value optimized out>) at thread_pthread.c:87
#3  gvl_yield (limits_us=<value optimized out>) at thread_pthread.c:155
#4  rb_thread_schedule_limits (limits_us=<value optimized out>) at thread.c:1209
#5  0x00007ffff7b03b7c in rb_threadptr_execute_interrupts (th=<value optimized out>, blocking_timing=0) at thread.c:2019
#6  0x00007ffff7ae236d in vm_call0_body (th=0x1988bb0, ci=<value optimized out>, argv=0x7fffe5fc10b0) at vm_eval.c:252
#7  0x00007ffff7ae331e in vm_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:59
#8  rb_call0 (recv=18167000, mid=152, n=1) at vm_eval.c:349
#9  rb_call (recv=18167000, mid=152, n=1) at vm_eval.c:616
#10 rb_funcall (recv=18167000, mid=152, n=1) at vm_eval.c:818
#11 0x00007fffef6659ed in eqrb::operator()(unsigned long, unsigned long) const () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#12 0x00007fffef6f6b6f in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::KeyInfo::equals(unsigned long const&, unsigned long const&) const ()
   from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#13 0x00007fffef6f5739 in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::equals(unsigned long const&, unsigned long const&) const ()
---Type <return> to continue, or q <return> to quit---
   from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#14 0x00007fffef6f65ab in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::test_deleted_key(unsigned long const&) const ()
   from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#15 0x00007fffef6f5440 in google::sparse_hashtable<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::test_deleted(google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > > const&) const ()
   from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#16 0x00007fffef6f4a49 in google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::advance_past_deleted() ()
   from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#17 0x00007fffef6f45e4 in google::sparse_hashtable_iterator<std::pair<unsigned long const, unsigned long>, unsigned long, hashrb, google::sparse_hash_map<unsigned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SelectKey, google::sparse_hash_map<unsi---Type <return> to continue, or q <return> to quit---
gned long, unsigned long, hashrb, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::SetKey, eqrb, google::libc_allocator_with_realloc<std::pair<unsigned long const, unsigned long> > >::operator++() () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#18 0x00007fffef6f38e8 in mark_hash_map_values () from /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so
#19 0x00007ffff79c91cb in gc_mark_children (objspace=0x6029f0, obj=18167000) at gc.c:4190
#20 0x00007ffff79ca5f4 in rgengc_rememberset_mark (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>,
    immediate_sweep=<value optimized out>, reason=<value optimized out>) at gc.c:5376
#21 gc_marks_start (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
    reason=<value optimized out>) at gc.c:4951
#22 gc_marks (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
    reason=<value optimized out>) at gc.c:5206
#23 gc_start (objspace=0x6029f0, full_mark=<value optimized out>, immediate_mark=<value optimized out>, immediate_sweep=<value optimized out>,
    reason=<value optimized out>) at gc.c:5957
#24 0x00007ffff79cc65c in heap_prepare (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1564
#25 heap_get_freeobj_from_next_freepage (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1576
#26 heap_get_freeobj (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1600
#27 newobj_of (klass=6687280, flags=8231, v1=0, v2=0, v3=0) at gc.c:1645
#28 0x00007ffff796ae69 in ary_alloc (capa=3) at array.c:445

(1) Thread 12 runs GC
(2) mark function in /usr/local/rvm/gems/ruby-2.2.3/gems/google_hash-0.8.9/lib/google_hash.so is called. #17 (how to read...?)
(3) #13 calls Ruby code from mark function. <- BUG
(4) Ruby code switches to Thread 6.
(5) Thread 6 runs own Ruby code and cause GC
(6) get [BUG] message

This is 3rd party issue.

Updated by ko1 (Koichi Sasada) over 8 years ago

https://github.com/rdp/google_hash/blob/master/ext/template/google_hash.cpp.erb#L170

I'm not sure the implementation details, but

for(<%= type %>_hash_map< <%= key_type %>, <%= value_type %> <%= extra_hash_params %> >::iterator it = incoming->hash_map->begin(); it != incoming->hash_map->end(); ++it) {

this iteration calls #== method.

Updated by nobu (Nobuyoshi Nakada) over 8 years ago

  • Status changed from Open to Third Party's Issue

Updated by charlez (Charles Leu) over 8 years ago

Koichi Sasada wrote:

https://github.com/rdp/google_hash/blob/master/ext/template/google_hash.cpp.erb#L170

I'm not sure the implementation details, but

for(<%= type %>_hash_map< <%= key_type %>, <%= value_type %> <%= extra_hash_params %> >::iterator it = incoming->hash_map->begin(); it != incoming->hash_map->end(); ++it) {

this iteration calls #== method.

Thanks for the prompt attention and feedback. It's greatly appreciated.

google_hash will be removed (unfortunately it was being relied upon heavily), and I'll re-test.
If other gems (e.g. yajl-ruby) are also problematic I'll remove their usage as well.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0