Bug #3999
closedimap connection thread stuck on wait call
Description
=begin
My ruby application is making many connections to imap servers and after about a day or so of running, all 200 worker threads deadlock on @response_arrival.wait within imap.rb:
def get_tagged_response(tag)
until @tagged_responses.key?(tag)
@response_arrival.wait
end
return pick_up_tagged_response(tag)
end
I looked at the code and the following lines look suspicious to me:
def receive_responses
while true
begin
resp = get_response
rescue Exception
@sock.close
@client_thread.raise($!)
break
end
break unless resp
begin
synchronize do
case resp
when TaggedResponse
@tagged_responses[resp.tag] = resp
@response_arrival.broadcast
if resp.tag == @logout_command_tag
return
end
when UntaggedResponse
record_response(resp.name, resp.data)
if resp.data.instance_of?(ResponseText) &&
(code = resp.data.code)
record_response(code.name, code.data)
end
if resp.name == "BYE" && @logout_command_tag.nil?
@sock.close
raise ByeResponseError, resp.raw_data
end
when ContinuationRequest
@continuation_request = resp
@response_arrival.broadcast
end
@response_handlers.each do |handler|
handler.call(resp)
end
end
rescue Exception
@client_thread.raise($!)
end
end
end
It looks like in the conditions of "when UntaggedResponse", the @response_arrival.broadcast is not invoked. Would this cause the waiting thread to wait forever?
thx
=end
Updated by kbaum (Karl Baum) about 14 years ago
=begin
Update to this. I did some thread dumps and it looks like the code above is stuck waiting on this thread:
"Thread-9802" daemon prio=10 tid=0x0000000003e71000 nid=0x5e78 runnable [0x00007f3f6e9c6000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00007f3ff95fbe48> (a sun.nio.ch.Util$1)
- locked <0x00007f3ff95fbe30> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007f3ff95fba70> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
at org.jruby.ext.openssl.SSLSocket.waitSelect(SSLSocket.java:240)
at org.jruby.ext.openssl.SSLSocket.sysread(SSLSocket.java:448)
at org.jruby.ext.openssl.SSLSocket$i_method_0_1$RUBYINVOKER$sysread.call(org/jruby/ext/openssl/SSLSocket$i_method_0_1$RUBYINVOKER$sysread.gen:65535)
at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:630)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:186)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:146)
at ruby.jit.fill_rbuff_0CEA65EC1C909A5CF6CB7E8100C08A3F334C6007.rescue_1$RUBY$__rescue___0(buffering.rb:35)
at ruby.jit.fill_rbuff_0CEA65EC1C909A5CF6CB7E8100C08A3F334C6007.file(buffering.rb:34)
at ruby.jit.fill_rbuff_0CEA65EC1C909A5CF6CB7E8100C08A3F334C6007.file(buffering.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:119)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:106)
at ruby.jit.gets_78EF17A1CC25A7841D76040F4F007ECC23A22E41.file(buffering.rb:106)
at ruby.jit.gets_78EF17A1CC25A7841D76040F4F007ECC23A22E41.file(buffering.rb)
at org.jruby.ast.executable.AbstractScript.file(AbstractScript.java:39)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:153)
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:309)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:148)
at ruby.jit.get_response_9C1FCDC0EA0182083B570911F4E3FECA09F08611.file(imap.rb:994)
at ruby.jit.get_response_9C1FCDC0EA0182083B570911F4E3FECA09F08611.file(imap.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:119)
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:289)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:108)
at org.jruby.ast.VCallNode.interpret(VCallNode.java:85)
at org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.RescueNode.executeBody(RescueNode.java:199)
at org.jruby.ast.RescueNode.interpretWithJavaExceptions(RescueNode.java:118)
at org.jruby.ast.RescueNode.interpret(RescueNode.java:110)
at org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:139)
at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:159)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:106)
at org.jruby.ast.VCallNode.interpret(VCallNode.java:85)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.runtime.InterpretedBlock.evalBlockBody(InterpretedBlock.java:373)
at org.jruby.runtime.InterpretedBlock.yield(InterpretedBlock.java:327)
at org.jruby.runtime.BlockBody.call(BlockBody.java:78)
at org.jruby.runtime.Block.call(Block.java:89)
at org.jruby.RubyProc.call(RubyProc.java:224)
at org.jruby.RubyProc.call(RubyProc.java:207)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:94)
at java.lang.Thread.run(Thread.java:636)
Half of my threads are hanging on "@response_arrival.wait" and the rest is waiting on imap.rb:994 which is s = @sock.gets(CRLF) within the get_response method.
thx.
=end
Updated by naruse (Yui NARUSE) about 14 years ago
- Status changed from Open to Third Party's Issue
- ruby -v set to jruby
=begin
Ask JRuby.
=end
Updated by kbaum (Karl Baum) about 14 years ago
=begin
Hi Yui. Just so i know what to tell the jruby team, why do you think it's a jruby issue? Reason i came here is because it looks like jruby uses the exact same code as MRI.
Thanks for your help!
-karl
=end
Updated by naruse (Yui NARUSE) about 14 years ago
=begin
Yes, your question is reasonable but
- you didn't show reproducible code
- you didn't write the OS and the version of ruby
- at least such crash of jruby is not MRI's issue
- I don't want to see jruby's crash report
- threads are very implementation dependent problem, so it can be jruby specific even if the ruby's code is the same
So please report with reproducible after you confirm the reproduction with some MRI version.
=end