select doesn't handle fd's > FD_SETSIZE very well

The main motivation for reporting this bug is that currently on win32 [any version] if you try to pass more than 64 sockets to select, it ignores the 65th onward. This limit is not obeyed by Ruby currently and therefore some sockets will never select.

I'd first suggest raising an error if more than 64 total sockets are passed [to the most recent thread that called select?]

While examining it, this same aspect [too many or too high of numbered selects] is also seen to cause some problems on other platforms.

In the test cases attached, sometimes Mac OS X consistently crashes, sometimes Linux does. Depends on the test.

I'd bet that these problems also continue for Ruby 1.9 as well.

How to run tests:
first touch file 'abc' # used for some tests
then run ulimit -n 2000 # if this fails, you may need to run "sudo bash" first
then run the files.

To ruby's credit, without having first run the ulimit, above, most platforms don't seg fault [though windows still has the difficulty mentioned], so this isn't life critical.

Note that in Linux/OS X if a file descriptor's "number" is > FD_SETSIZE then it will be ignored by select. In Windows if the "total number of descriptors" past to select is > FD_SETSIZE then those past FD_SETSIZE will be ignored.

test_crashes_os_x.rb (1020 Bytes) test_crashes_os_x.rb rogerdpack (Roger Pack), 10/21/2008 08:01 AM
test_select_crashes_linux.rb (1003 Bytes) test_select_crashes_linux.rb rogerdpack (Roger Pack), 10/21/2008 08:01 AM
test_select_hangs_on_linux.rb (998 Bytes) test_select_hangs_on_linux.rb rogerdpack (Roger Pack), 10/21/2008 08:01 AM
test_select_hangs_windows.rb (1017 Bytes) test_select_hangs_windows.rb rogerdpack (Roger Pack), 10/21/2008 08:01 AM
socket.diff (1.86 KB) socket.diff rogerdpack (Roger Pack), 02/08/2010 11:13 PM

Updated by rogerdpack (Roger Pack) about 12 years ago

=begin patch helps with the 'hangs_windows' test case.


Updated by shyouhei (Shyouhei Urabe) almost 12 years ago

Updated by usa (Usaku NAKAMURA) almost 12 years ago

When the patch is applied, every select call uses the stack by 112KB (about 1.8KB at present).
And, if you pass 4097 sockets to select, the same problem occurs again.
So this approach is not preferable.

In 1.9, this problem has already been solved, because we didn't have to consider the binary level compatibility.
But in 1.8, we should give priority to maintenance of compatibility more than such a rare case.

Of course, it is necessary to correct this problem if we can.
I leave this problem as Open until another effective ideas arise.


Updated by rogerdpack (Roger Pack) almost 12 years ago

Two things come to mind, if 4096 is not an acceptable option :)
1) check for fd number count before doing a select. If it's > FD_SETSIZE raise error.
2) A compromise, ex: setting FD_SETSIZE to 512, since that's the limit of MSVCRT 6, which is used by 99% of 1.8.x compilers. I put 4096 as a conservative number.

Thanks for looking into this.



Updated by usa (Usaku NAKAMURA) over 11 years ago

Updated by rogerdpack (Roger Pack) over 11 years ago

Would it be possible to at least set FD_SETSIZE to 256 so that it works with VC6/mingw (which is 99% of ruby on windows distros)?

That would be good--currently there it is hard for extensions to use select correctly because they overflow the FD set and cannot reset it because it is hard compiled to too small on windows.


Updated by rogerdpack (Roger Pack) almost 11 years ago

Attaching new patch that has a smaller FD_SETSIZE, with test (fails without FD_SETSIZE, succeeds with).
Possible to commit this one, perhaps?


Updated by rogerdpack (Roger Pack) over 10 years ago

Another option might be something like the following (any feedback?)

Index: win32/win32.h
--- win32/win32.h (revision 26621)
+++ win32/win32.h (working copy)
@@ -23,6 +23,9 @@
#define USE_WINSOCK2
+#ifndef FD_SETSIZE
+#if defined MINGW32
+# define FD_SETSIZE 256 // larger default for msvcrt 6


Updated by usa (Usaku NAKAMURA) over 3 years ago

1.8.6 is out of date

