Bug #6369
closedGServer blocking after shutdown called
Description
Hi,
My investigation of bug #6358 points the finger at GServer triggering bug #5343 on lower power arm-linux platforms, because of the blocking TCPServer#accept call.
I believe there is also an underlying design flaw: the main thread (@tcpServerThread) is blocked in the TCPServer#accept call, so if no further connections are attempted after #shutdown() is called, the main thread remains blocked and will hang until it is forced into the ensure clause by the destruction of the GServer object by GC
This proposed patch to GServer seems to tackle the first issue, and also effectively makes the TCPServer#accept nonblocking by using a Timeout block to ensure that the main thread periodically (every 0.1s) has the chance to shut down gracefully if no further connections are forthcoming.
It also adds a GServer#shutdown! method which will block until the GServer has gracefully shut down
59,60c59,62
< # server.shutdown
< #
server.shutdown #will not wait for the server thread to quit¶
server.shutdown! #blocks until server thread has quit¶
122c124,131
<
def shutdown!
shutdown
while !stopped?
sleep 0.01
end
end
189c198
< @@servicesMutex.synchronize {
@@servicesMutex.synchronize do
197,198c206,208
< }
< @tcpServerThread = Thread.new {
end @tcpServerThread = Thread.new do
202c212
< @connectionsMutex.synchronize {
@connectionsMutex.synchronize do
206,217c216,240
< }
< client = @tcpServer.accept
< @connections << Thread.new(client) { |myClient|
< begin
< myPort = myClient.peeraddr[1]
< serve(myClient) if @audit or connecting(myClient)
< rescue => detail
< error(detail) if @debug
< ensure
< begin
< myClient.close
< rescue
end begin Timeout::timeout(0.1) do client = @tcpServer.accept @connections << Thread.new(client) do |myClient| begin myPort = myClient.peeraddr[1] serve(myClient) if !@audit or connecting(myClient) rescue => detail error(detail) if @debug ensure begin myClient.close sleep 1 rescue end @connectionsMutex.synchronize do @connections.delete(Thread.current) @connectionsCV.signal end disconnecting(myPort) if @audit end
219,223d241
< @connectionsMutex.synchronize {
< @connections.delete(Thread.current)
< @connectionsCV.signal
< }
< disconnecting(myPort) if @audit
225c243,245
< }
rescue Timeout::Error end
226a247
235c256
< @connectionsMutex.synchronize {
@connectionsMutex.synchronize do
239c260
< }
end
244c265
< @@servicesMutex.synchronize {
@@servicesMutex.synchronize do
246c267
< }
end
249c270
< }
end
Updated by mame (Yusuke Endoh) over 12 years ago
- Status changed from Open to Assigned
- Assignee set to JEG2 (James Gray)
Hello,
I believe there is also an underlying design flaw: the main thread (@tcpServerThread) is blocked in the TCPServer#accept call, so if no further connections are attempted after #shutdown() is called, the main thread remains blocked and will hang until it is forced into the ensure clause by the destruction of the GServer object by GC
Any living thread is not GC'ed. If any thread blocks in TCPServer#accept,
the thread has a referrence to the corresponding IO; the IO is not GC'ed.
There is something wrong.
This proposed patch to GServer seems to tackle the first issue
Thanks, assigning this ticket to JEG2 who is a maintainer of GServer.
But I guess it may be difficult to import your patch for some reasons.
- It looks like dirty workaround, not essential fix.
(sleep 1 sec within 0.1 sec timeout?) - It is difficult to check if it affects other platforms.
- In general, using timeout is not a good idea; it is really difficult
to control. - Do not include any change unrelated to the bug fix.
The essential part of your patch is below, right?
diff --git a/lib/gserver.rb b/lib/gserver.rb
index f6f37d3..ec69444 100644
--- a/lib/gserver.rb
+++ b/lib/gserver.rb
@@ -120,6 +120,13 @@ class GServer
}
end
- def shutdown!
- shutdown
-
while !stopped?
-
sleep 0.01
- end
- end
-
Returns true if the server has stopped.¶
def stopped?
@tcpServerThread == nil
@@ -260,6 +267,10 @@ class GServer
@connectionsCV.wait(@connectionsMutex)
end
}
+begin
- Timeout::timeout(0.1) do
-
client = @tcpServer.accept @connections << Thread.new(client) { |myClient| begin
@@ -270,6 +281,7 @@ class GServer
ensure
begin
myClient.close
-
sleep 1 rescue end @connectionsMutex.synchronize {
@@ -279,6 +291,11 @@ class GServer
disconnecting(myPort) if @audit
end
}
+
- end
+rescue Timeout::Error
+end -
end rescue => detail error(detail) if @debug
--
Yusuke Endoh mame@tsg.ne.jp
Updated by naruse (Yui NARUSE) over 12 years ago
When you send a patch,
- don't split header (we don't search the file)
- don't introduce cosmetic changes (follow the coding style of original code)
- use unified diff (diff -u)
- use -p option (-p --show-c-function Show which C function each change is in.)
the option makes readers easy to understand.
Updated by stevegoobermanhill (stephen gooberman-hill) over 12 years ago
Hi Yusuke-san,
all your syntactic points are accepted. This is the first patch I have submitted,so please excuse my lack of knowledge of the accepted submission protocols.
Also, I completely accept that the initial motivation for the proposed changes was the problems I was having with an embedded arm-linux platform (bug #6358 which seems to be a duplication of #5343 after some investigation). It was only by chance, after I had written this patch, that I saw the title of bug #5343 and made the connection to the TCPServer#accept call which can trigger this bug.
However, I do contend that there is a flaw in the GServer design : GServer will not shutdown until it has accepted one connection after the shutdown is called. This should be easily testable on all platforms.
When I wrote that the TCPServer#accept call blocked, what I was meaning was that the graceful shutdown is initiated by the thread leaving the while @shutdown loop starting on line 211 of the original code. But because TCPServer#accept is blocking, this loop is not exited until after the next connection is accepted after @shutdown goes true. So you will always accept one connection after shutdown has been called - and if you don't get another connection then the GServer will never shutdown until the programme is terminated, or the calling thread is killed.
I'm not sure what the alternative to a Timeout is - the proposed patch uses it to ensure that the while loop clause is regularly checked so that the shutdown will actually run if no further connections are forthcoming. I'm not sure how else one might force the TCPServer#accept statement to run in a non-blocking fashion - TCPServer#accept is very different to Socket#accept_nonblock. For safety, we might also wish to check that @shutdown hasn't turned true during the wait for the accept - adding a sleep statement to force the timeout in this case would seem logical.
The sleep(1) at @@ -270,6 +281,7: this should probably have been removed before submission - I am so deeply in embedded ruby mode at the moment that sprinkling sleep statements across the code to keep things happy has become second nature (there are many places I have to wait for the OS!).
Kind regards,
Steve
Updated by mame (Yusuke Endoh) over 6 years ago
- Status changed from Assigned to Closed