Project

General

Profile

Actions

Bug #6369

closed

GServer blocking after shutdown called

Added by stevegoobermanhill (stephen gooberman-hill) almost 12 years ago. Updated about 6 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 1.9.2p290 (2011-07-09 revision 32553) [i686-linux]
[ruby-core:44666]

Description

Hi,
My investigation of bug #6358 points the finger at GServer triggering bug #5343 on lower power arm-linux platforms, because of the blocking TCPServer#accept call.

I believe there is also an underlying design flaw: the main thread (@tcpServerThread) is blocked in the TCPServer#accept call, so if no further connections are attempted after #shutdown() is called, the main thread remains blocked and will hang until it is forced into the ensure clause by the destruction of the GServer object by GC

This proposed patch to GServer seems to tackle the first issue, and also effectively makes the TCPServer#accept nonblocking by using a Timeout block to ensure that the main thread periodically (every 0.1s) has the chance to shut down gracefully if no further connections are forthcoming.

It also adds a GServer#shutdown! method which will block until the GServer has gracefully shut down

59,60c59,62
< # server.shutdown
< #

server.shutdown #will not wait for the server thread to quit

server.shutdown! #blocks until server thread has quit

122c124,131
<

def shutdown!
shutdown
while !stopped?
sleep 0.01
end
end

189c198
< @@servicesMutex.synchronize {

@@servicesMutex.synchronize do

197,198c206,208
< }
< @tcpServerThread = Thread.new {

end

@tcpServerThread = Thread.new do

202c212
< @connectionsMutex.synchronize {

      @connectionsMutex.synchronize do

206,217c216,240
< }
< client = @tcpServer.accept
< @connections << Thread.new(client) { |myClient|
< begin
< myPort = myClient.peeraddr[1]
< serve(myClient) if @audit or connecting(myClient)
< rescue => detail
< error(detail) if @debug
< ensure
< begin
< myClient.close
< rescue

      end
      
      
      begin
        Timeout::timeout(0.1) do 
          client = @tcpServer.accept
     
          @connections << Thread.new(client) do |myClient|
            begin
              myPort = myClient.peeraddr[1]
              serve(myClient) if !@audit or connecting(myClient)
            rescue => detail
              error(detail) if @debug
            ensure
              begin
                myClient.close
                sleep 1
              rescue
              end
              @connectionsMutex.synchronize do
                @connections.delete(Thread.current)
                @connectionsCV.signal
              end
              disconnecting(myPort) if @audit
            end

219,223d241
< @connectionsMutex.synchronize {
< @connections.delete(Thread.current)
< @connectionsCV.signal
< }
< disconnecting(myPort) if @audit
225c243,245
< }

     rescue Timeout::Error
     end

226a247

235c256
< @connectionsMutex.synchronize {

      @connectionsMutex.synchronize do

239c260
< }

      end

244c265
< @@servicesMutex.synchronize {

    @@servicesMutex.synchronize do

246c267
< }

    end

249c270
< }

end

Updated by mame (Yusuke Endoh) almost 12 years ago

  • Status changed from Open to Assigned
  • Assignee set to JEG2 (James Gray)

Hello,

I believe there is also an underlying design flaw: the main thread (@tcpServerThread) is blocked in the TCPServer#accept call, so if no further connections are attempted after #shutdown() is called, the main thread remains blocked and will hang until it is forced into the ensure clause by the destruction of the GServer object by GC

Any living thread is not GC'ed. If any thread blocks in TCPServer#accept,
the thread has a referrence to the corresponding IO; the IO is not GC'ed.
There is something wrong.

This proposed patch to GServer seems to tackle the first issue

Thanks, assigning this ticket to JEG2 who is a maintainer of GServer.
But I guess it may be difficult to import your patch for some reasons.

  • It looks like dirty workaround, not essential fix.
    (sleep 1 sec within 0.1 sec timeout?)
  • It is difficult to check if it affects other platforms.
  • In general, using timeout is not a good idea; it is really difficult
    to control.
  • Do not include any change unrelated to the bug fix.

The essential part of your patch is below, right?

diff --git a/lib/gserver.rb b/lib/gserver.rb
index f6f37d3..ec69444 100644
--- a/lib/gserver.rb
+++ b/lib/gserver.rb
@@ -120,6 +120,13 @@ class GServer
}
end

  • def shutdown!
  • shutdown
  • while !stopped?
    
  •  sleep 0.01
    
  • end
  • end
  • Returns true if the server has stopped.

    def stopped?
    @tcpServerThread == nil
    @@ -260,6 +267,10 @@ class GServer
    @connectionsCV.wait(@connectionsMutex)
    end
    }

+begin

  • Timeout::timeout(0.1) do
  •      client = @tcpServer.accept
         @connections << Thread.new(client)  { |myClient|
           begin
    

@@ -270,6 +281,7 @@ class GServer
ensure
begin
myClient.close

  •            sleep 1
             rescue
             end
             @connectionsMutex.synchronize {
    

@@ -279,6 +291,11 @@ class GServer
disconnecting(myPort) if @audit
end
}
+

  • end
    +rescue Timeout::Error
    +end
  •    end
     rescue => detail
       error(detail) if @debug
    

--
Yusuke Endoh

Updated by naruse (Yui NARUSE) almost 12 years ago

When you send a patch,

  • don't split header (we don't search the file)
  • don't introduce cosmetic changes (follow the coding style of original code)
  • use unified diff (diff -u)
  • use -p option (-p --show-c-function Show which C function each change is in.)
    the option makes readers easy to understand.

Updated by stevegoobermanhill (stephen gooberman-hill) almost 12 years ago

Hi Yusuke-san,
all your syntactic points are accepted. This is the first patch I have submitted,so please excuse my lack of knowledge of the accepted submission protocols.

Also, I completely accept that the initial motivation for the proposed changes was the problems I was having with an embedded arm-linux platform (bug #6358 which seems to be a duplication of #5343 after some investigation). It was only by chance, after I had written this patch, that I saw the title of bug #5343 and made the connection to the TCPServer#accept call which can trigger this bug.

However, I do contend that there is a flaw in the GServer design : GServer will not shutdown until it has accepted one connection after the shutdown is called. This should be easily testable on all platforms.

When I wrote that the TCPServer#accept call blocked, what I was meaning was that the graceful shutdown is initiated by the thread leaving the while @shutdown loop starting on line 211 of the original code. But because TCPServer#accept is blocking, this loop is not exited until after the next connection is accepted after @shutdown goes true. So you will always accept one connection after shutdown has been called - and if you don't get another connection then the GServer will never shutdown until the programme is terminated, or the calling thread is killed.

I'm not sure what the alternative to a Timeout is - the proposed patch uses it to ensure that the while loop clause is regularly checked so that the shutdown will actually run if no further connections are forthcoming. I'm not sure how else one might force the TCPServer#accept statement to run in a non-blocking fashion - TCPServer#accept is very different to Socket#accept_nonblock. For safety, we might also wish to check that @shutdown hasn't turned true during the wait for the accept - adding a sleep statement to force the timeout in this case would seem logical.

The sleep(1) at @@ -270,6 +281,7: this should probably have been removed before submission - I am so deeply in embedded ruby mode at the moment that sprinkling sleep statements across the code to keep things happy has become second nature (there are many places I have to wait for the OS!).

Kind regards,

Steve

Updated by mame (Yusuke Endoh) about 6 years ago

  • Status changed from Assigned to Closed

GServer has been already moved to GitHub by #5480. I've created a ticket into its tracker, so I'd like to close this ticket.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0