Bug #19412
openSocket starts queueing and not responding after a certain amount of requests
Description
Here is a an example code:
require 'socket'
PORT = 8080
BACKLOG = 50
LISTEN = '127.0.0.1'
def handle_connection(connection, _addrinfo)
request = connection.gets
puts request
connection.write "HTTP/1.1 200\r\n"
connection.write "Content-Type: text/html\r\n"
connection.write "\r\n"
connection.write "Hello world! Current time is #{Time.now}"
ensure
connection.close
end
begin
socket = Socket.new(:INET, :STREAM)
socket.bind(Addrinfo.tcp(LISTEN, PORT))
socket.listen(BACKLOG)
puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN
loop do
listening, = IO.select([socket])
io, = listening
connection, addrinfo = io.accept
handle_connection(connection, addrinfo)
end
ensure
socket.close
end
This tries to simulate a TCP server that responds as if it was an HTTP server.
The amount of requests it can sustain seems to depend on the OS. On a Linux machine running ubuntu 20.04 I get something around 7.6K to 7.8K until it stops responding.
$ uname -a
Linux ... 5.17.5-x86_64-linode154 #1 SMP PREEMPT Mon May 2 15:07:22 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
$ ab -n 20000 -c 50 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 7883 requests completed
on MacOS Ventura I get around 16K:
$ uname -a
Darwin ... 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:04:44 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T8103 arm64
$ ab -n 20000 -c 50 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16375 requests completed
in both cases when that limit reaches, if I abort the program (CTRL-C) and try to run it again it takes a while until the port is released:
...
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
^Csocketserver.rb:29:in `select': Interrupt
from socketserver.rb:29:in `block in <main>'
from socketserver.rb:28:in `loop'
from socketserver.rb:28:in `<main>'
$ ruby socketserver.rb
socketserver.rb:23:in `bind': Address already in use - bind(2) for 127.0.0.1:8080 (Errno::EADDRINUSE)
from socketserver.rb:23:in `<main>'
After killing the process it seems no process is holding the port:
lsof -wni tcp:8080
Running the command above does not return anything (it does when the program is still running).
I think we may be failing to release something when interacting with the Socket on syscalls/kernel level and we endup filling up some queue/buffer that eventually gets freed.
Files
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
To try to narrow down the issue, does it still occur if you change the main loop section to?:
begin
socket = TCPServer.new(LISTEN, PORT)
socket.listen(BACKLOG)
puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN
loop do
handle_connection(socket.accept, nil)
end
ensure
socket.close
end
Updated by brodock (Gabriel Mazetto) over 1 year ago
@jeremyevans0 (Jeremy Evans) yes, it behaves the same regarding the max amount of requests it can handle:
$ ab -n 20000 -c 50 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16438 requests completed
the one difference is that I can bind right after I abort the program, but it takes a while for it to run the requests again, and when it does, it get limitar 16k requests...
$ ab -n 20000 -c 50 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16398 requests completed
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
Using the server given, I cannot reproduce using the following Ruby code:
require 'socket'
PORT = 8080
LISTEN = '127.0.0.1'
20000.times do
socket = TCPSocket.new(LISTEN, PORT)
socket.write("HTTP/1.1 GET /\n")
socket.shutdown(Socket::SHUT_WR)
socket.read
ensure
socket&.close
end
Is there a way you can reproduce this using ruby as the client instead of ApacheBench? That would allow for easier testing. Also, does this reproduce when using ApacheBench with -c 1
?
Updated by brodock (Gabriel Mazetto) over 1 year ago
- ruby -v set to ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
running apache benchmark with -c1
does show some of the behavior. It will go from 0 to somethign around 16000 in 4 seconds and get stuck there, see a 3x run log:
ab -n 20000 -c 1 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16346 requests completed
ab -n 20000 -c 1 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16347 requests completed
ab -n 20000 -c 1 http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
apr_socket_recv: Operation timed out (60)
Total of 16349 requests completed
I've also run a version of your client and got to reproduce the error (with similar output):
require 'socket'
PORT = 8080
LISTEN = '127.0.0.1'
NTIMES = 20000
NTIMES.times do |t|
socket = TCPSocket.new(LISTEN, PORT)
socket.write("HTTP/1.1 GET /\n")
socket.shutdown(Socket::SHUT_WR)
socket.read
puts "Completed #{t} requests" if t > 0 && t % (NTIMES / 10) == 0
rescue Exception => e
puts "Total of #{t-1} requests completed"
raise e
ensure
socket&.close
end
ruby --version
Could not locate Gemfile or .bundle/ directory
ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
ruby client.rb
Could not locate Gemfile or .bundle/ directory
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Total of 16344 requests completed
client.rb:8:in `initialize': Operation timed out - connect(2) for "127.0.0.1" port 8080 (Errno::ETIMEDOUT)
from client.rb:8:in `new'
from client.rb:8:in `block in <main>'
from client.rb:7:in `times'
from client.rb:7:in `<main>'
and with ruby 3.2.1:
rvm use 3.2.1
Using /Users/brodock/.rvm/gems/ruby-3.2.1
ruby --version
Could not locate Gemfile or .bundle/ directory
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [arm64-darwin22]
ruby client.rb
Could not locate Gemfile or .bundle/ directory
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Total of 16342 requests completed
client.rb:8:in `initialize': connect(2) for "127.0.0.1" port 8080 (IO::TimeoutError)
from client.rb:8:in `new'
from client.rb:8:in `block in <main>'
from client.rb:7:in `times'
from client.rb:7:in `<main>'
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
brodock (Gabriel Mazetto) wrote in #note-4:
I've also run a version of your client and got to reproduce the error (with similar output):
require 'socket' PORT = 8080 LISTEN = '127.0.0.1' NTIMES = 20000 NTIMES.times do |t| socket = TCPSocket.new(LISTEN, PORT) socket.write("HTTP/1.1 GET /\n") socket.shutdown(Socket::SHUT_WR) socket.read puts "Completed #{t} requests" if t > 0 && t % (NTIMES / 10) == 0 rescue Exception => e puts "Total of #{t-1} requests completed" raise e ensure socket&.close end
I tried that client program in my environment (OpenBSD/amd64), and was able to run it without failure with both Ruby 3.1 and 3.2. In terms of further testing, it looks like you are using rvm. While it seems unlikely that that is the problem, can you try using a system-provided Ruby? Also, did you test the above client program on Linux (the output shows MacOS)?
Updated by brodock (Gabriel Mazetto) over 1 year ago
I've went back to my Linux machine and tried to reproduce the problem.
Using Ruby 3.0 (from the package manager), I got to make it crash only using some decent concurrency (100) and running it for longer time:
brodock@pop-os:~/Projetos/ractor$ neofetch
///////////// brodock@pop-os
///////////////////// --------------
///////*767//////////////// OS: Pop!_OS 22.04 LTS x86_64
//////7676767676*////////////// Kernel: 6.2.0-76060200-generic
/////76767//7676767////////////// Uptime: 13 mins
/////767676///*76767/////////////// Packages: 2031 (dpkg), 25 (flatpak)
///////767676///76767.///7676*/////// Shell: bash 5.1.16
/////////767676//76767///767676//////// Resolution: 3840x2160
//////////76767676767////76767///////// DE: GNOME 42.5
///////////76767676//////7676////////// WM: Mutter
////////////,7676,///////767/////////// WM Theme: Pop
/////////////*7676///////76//////////// Theme: Pop-dark [GTK2/3]
///////////////7676//////////////////// Icons: Pop [GTK2/3]
///////////////7676///767//////////// Terminal: gnome-terminal
//////////////////////'//////////// CPU: AMD Ryzen 9 5900X (24) @ 3.700GHz
//////.7676767676767676767,////// GPU: NVIDIA GeForce GTX 1660 SUPER
/////767676767676767676767///// Memory: 3709MiB / 32002MiB
///////////////////////////
/////////////////////
/////////////
ruby --version
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
ab -n 90000 -c 200 http://localhost:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 9000 requests
Completed 18000 requests
Completed 27000 requests
Completed 36000 requests
Completed 45000 requests
Completed 54000 requests
Completed 63000 requests
Completed 72000 requests
Completed 81000 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 89888 requests completed
It seems Linux can handle the issue better than Mac OS, but whatever is breaking is still there.
This is what I got running the server part:
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
GET / HTTP/1.0
bug-19412.rb:13:in `write': Broken pipe (Errno::EPIPE)
from bug-19412.rb:13:in `handle_connection'
from bug-19412.rb:28:in `block in <main>'
from bug-19412.rb:27:in `loop'
from bug-19412.rb:27:in `<main>'
When I just run the client.rb
it doesn´t break as there is not enough concurrency, so it never reaches a point where it can exhaust whatever resource is getting exhausted.
Weirdly enough I can´t reproduce the problem when using hey: https://github.com/rakyll/hey
hey -n 180000 -c 300 http://localhost:8080/
Summary:
Total: 14.8814 secs
Slowest: 14.5462 secs
Fastest: 0.0002 secs
Average: 0.0112 secs
Requests/sec: 12095.6387
Response time histogram:
0.000 [1] |
1.455 [179742] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
2.909 [36] |
4.364 [77] |
5.819 [16] |
7.273 [88] |
8.728 [23] |
10.182 [0] |
11.637 [0] |
13.092 [0] |
14.546 [17] |
Latency distribution:
10% in 0.0006 secs
25% in 0.0015 secs
50% in 0.0019 secs
75% in 0.0020 secs
90% in 0.0022 secs
95% in 0.0023 secs
99% in 0.0025 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0066 secs, 0.0002 secs, 14.5462 secs
DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0491 secs
req write: 0.0000 secs, 0.0000 secs, 0.0025 secs
resp wait: 0.0045 secs, 0.0000 secs, 13.5324 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0033 secs
Status code distribution:
[200] 180000 responses
Updated by brodock (Gabriel Mazetto) over 1 year ago
I've also tried to build something similar to "hey" in ruby and couldn´t make it crash (this is likely because of the GVL, we can´t get real concurrency):
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'thread'
end
require 'socket'
require 'thread/pool'
require 'thread/channel'
PORT = 8080
LISTEN = '127.0.0.1'
NTIMES = 80000
CONCURRENCY = 512
pool = Thread.pool(CONCURRENCY)
channel = Thread.channel
NTIMES.times do |t|
pool.process do
socket = TCPSocket.new(LISTEN, PORT)
socket.write("HTTP/1.1 GET /\n")
socket.shutdown(Socket::SHUT_WR)
socket.read
channel.send([:success])
rescue Exception => e
channel.send([:error, e])
ensure
socket&.close
end
end
output_thread = Thread.new {
success = 0
errors = []
while data = channel.receive
case data[0]
when :success
success += 1
puts "Completed #{success} requests" if success > 0 && (success % (NTIMES / 10) == 0)
when :error
errors << data[1]
when :eof
puts "Finished with #{errors.size} errors"
errors.each { puts _1 }
break
end
end
}
pool.shutdown
channel.send([:eof])
output_thread.join
Updated by brodock (Gabriel Mazetto) over 1 year ago
@jeremyevans0 (Jeremy Evans) I've tested again in MacOS system provided ruby version:
ruby --version
Could not locate Gemfile or .bundle/ directory
ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22]
And that is consistent. It seems the problem is specific to MacOS, or worst when in MacOS.
I've tried running the code above to make sure this is not apache/hey thing.
The behavior is that it runs a couple of thousand requests, then stop for 15-20s then runs another batch and stops and continue doing that...
Because of how the code above handles exceptions, I can see then only at the end:
ruby client-pool.rb 16:54:52
Could not locate Gemfile or .bundle/ directory
Completed 8000 requests
Completed 16000 requests
Completed 24000 requests
Completed 32000 requests
Completed 40000 requests
Completed 48000 requests
Completed 56000 requests
Completed 64000 requests
Completed 72000 requests
Finished with 310 errors
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Connection reset by peer
Socket is not connected - shutdown(2)
Broken pipe
Broken pipe
Broken pipe
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
Connection reset by peer
Connection reset by peer - connect(2) for "127.0.0.1" port 8080
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Connection reset by peer - connect(2) for "127.0.0.1" port 8080
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Connection reset by peer
Connection reset by peer
Connection reset by peer
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
Connection reset by peer
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Socket is not connected - shutdown(2)
Connection reset by peer
Connection reset by peer
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
connect(2) for "127.0.0.1" port 8080
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
brodock (Gabriel Mazetto) wrote in #note-9:
@jeremyevans0 (Jeremy Evans) I've tested again in MacOS system provided ruby version:
ruby --version Could not locate Gemfile or .bundle/ directory ruby 2.6.10p210 (2022-04-12 revision 67958) [universal.arm64e-darwin22]
And that is consistent. It seems the problem is specific to MacOS, or worst when in MacOS.
I've tried running the code above to make sure this is not apache/hey thing.
In your pure Ruby hey-example, you have CONCURRENCY = 512
, but your server has BACKLOG = 50
. You would want BACKLOG
to be higher than CONCURRENCY
, otherwise I think at least some failures may be expected. Note that Ruby doesn't deal well with a high number of threads (and 512 threads would definitely be considered high in this case). For the client part, if you want to test concurrency, it would be better to fork/spawn separate processes and not use threads.
I don't have any other ideas about what could be going wrong. You should try to see if either your client or server is leaking file descriptors. You could also try increasing the timeout. Unfortunately, not being able to reproduce the problem, I can only guess as to what the root cause might be.
Updated by brodock (Gabriel Mazetto) over 1 year ago
- File clipboard-202303161214-zhq7o.png clipboard-202303161214-zhq7o.png added
- File clipboard-202303161214-7xx91.png clipboard-202303161214-7xx91.png added
- File clipboard-202303161215-5wjbo.png clipboard-202303161215-5wjbo.png added
@jeremyevans0 (Jeremy Evans) I'm new to debugging C code. Your idea on filedescriptors seems to to be somewhat spot on.
I'm not sure if this is the right way to debug this or not, but I managed to capture "Filesystem activity" through MacOS instrumentor.
As you can see, there is a big gap between a chunk of request it processed which can illustrate what I was observing.
looking at the description of what is happening you can see that the code couldn't get "accept" to return with success after some number of executions.
What is interesting is that I can't see any close
calls, only accept
and writev
which is weird. Could it be that somehow in MacOS we are not issuing close
calls and it is just "timing out"?
Updated by brodock (Gabriel Mazetto) over 1 year ago
when the pause happens it seems the socket is still kept alive, based on netstat:
(there is a huge list of those):
tcp4 0 0 localhost.http-alt localhost.56818 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56819 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56820 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56821 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56822 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56823 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56825 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56824 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56826 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56827 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56828 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56829 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56830 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56831 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56832 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56833 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56834 TIME_WAIT
tcp4 0 0 localhost.http-alt localhost.56835 TIME_WAIT
based on https://stackoverflow.com/questions/2187889/how-can-i-make-a-socket-close-immediately-bypassing-the-timeout-period it seems that there is a way to opt for socket reuse which would bypass this.
It still weird that there is such a small limit on MacOS for how many of those we keep waiting before we can't accept anymore
Updated by brodock (Gabriel Mazetto) over 1 year ago
Some other updates. I've been trying to undestand the TIME_WAIT
and why it does behave the way it doesn on MacOS (but not on Linux).
This response got me ideas to try setting linger
on:
https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-required
require 'socket'
PORT = 8080
BACKLOG = 64
LISTEN = '127.0.0.1'
LINGER = 0
@processed = 0
def handle_connection(connection, _addrinfo)
request = connection.gets
connection.write "HTTP/1.1 200\r\n"
connection.write "Content-Type: text/html\r\n"
connection.write "\r\n"
connection.write "Hello world! Current time is #{Time.now}"
@processed += 1
puts "#{request} - #{@processed}"
ensure
connection.setsockopt(Socket::Option.linger(true, LINGER))
connection.close
end
begin
socket = TCPServer.new(LISTEN, PORT)
socket.setsockopt(:SOCKET, :REUSEADDR, true)
socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, 1)
socket.listen(BACKLOG)
puts "Requested queue size: #{BACKLOG} bigger then limit: #{Socket::SOMAXCONN}" if BACKLOG > Socket::SOMAXCONN
loop do
handle_connection(socket.accept, nil)
end
ensure
socket.close
end
This modified code now does "release" the socket immediately, which allows me to respond to that huge ammount of requests.
My understanding is that if I set linger to 1
that would mean 1 tick/cycle or 1 second depending on the OS (worst case would be 1 second) but setting that on MacOS brings the old behavior back of having TIME_WAIT
all over the place.