Bug #21498
openWindows - Ruby Overrides C Library APIs thus breaking them
Description
I am trying to wrap a simple C++ library, https://github.com/baderouaich/BitmapPlusPlus, as a Ruby extension.
However when I use the extension to write a bitmap to disk the bitmap is corrupted. This is because the library uses std::ofstream which eventually uses the C API fclose to write the final bytes to the bitmap file and then closes it. The problem is that Ruby overrides fclose and replaces it with rb_w32_fclose. It then exports its custom version from from x64-vcruntime140-ruby340.dll. And the exported version is broken (at least from the standpoint of the C standard library).
Note this has been a long standing issue. The first report I see is from 2013:
https://bugs.ruby-lang.org/issues/8569
More recently in 2020 (which explains the issue very well):
https://github.com/NREL/OpenStudio/issues/3942#issuecomment-610673401
I understand that Ruby is trying to provide a platform independent API. But the problem is this solution breaks any third party libraries that rely on these C APIs (which of course are very common). And there is no good workaround (see https://github.com/NREL/OpenStudio/issues/3942#issuecomment-611072774).
So would it be possible for Ruby to stop exporting custom versions of basic C APIs? The code that does it is here:
https://github.com/ruby/ruby/blob/master/win32/mkexports.rb#L41
Ruby of course could still use its custom versions of fclose, read, write etc within ruby.exe and x64-vcruntime140-ruby340.dll. But they should not be exported from x64-vcruntime140-ruby340.dll and thus be off limits to extensions. If a case comes up where an extension really needs access to rb_w32_fclose instead of fclose then an extension developer can use an #ifdef _WIN32 to do so and work across platforms. That at least puts the developer in control versus now where I don't see any way I can wrap the bitmap library as a Ruby extension.
From my experience the biggest problem is the replacing of fclose with rb_w32_fclose.
This is the list of generated overrides:
FD_CLR=rb_w32_fdclr
FD_ISSET=rb_w32_fdisset
Sleep=rb_w32_Sleep
accept=rb_w32_accept
access=rb_w32_uaccess
bind=rb_w32_bind
close=rb_w32_close
connect=rb_w32_connect
dup2=rb_w32_dup2
fclose=rb_w32_fclose
fstat=rb_w32_fstati128
get_osfhandle=rb_w32_get_osfhandle
getcwd=rb_w32_ugetcwd
getenv=rb_w32_ugetenv
gethostbyaddr=rb_w32_gethostbyaddr
gethostbyname=rb_w32_gethostbyname
gethostname=rb_w32_gethostname
getpeername=rb_w32_getpeername
getpid=rb_w32_getpid
getppid=rb_w32_getppid
getprotobyname=rb_w32_getprotobyname
getprotobynumber=rb_w32_getprotobynumber
getservbyname=rb_w32_getservbyname
getservbyport=rb_w32_getservbyport
getsockname=rb_w32_getsockname
getsockopt=rb_w32_getsockopt
inet_ntop=rb_w32_inet_ntop
inet_pton=rb_w32_inet_pton
ioctlsocket=rb_w32_ioctlsocket
isatty=rb_w32_isatty
listen=rb_w32_listen
lseek=rb_w32_lseek
lstat=rb_w32_ulstati128
mkdir=rb_w32_umkdir
mmap=rb_w32_mmap
mprotect=rb_w32_mprotect
munmap=rb_w32_munmap
pipe=rb_w32_pipe
pread=rb_w32_pread
pwrite=rb_w32_pwrite
read=rb_w32_read
recv=rb_w32_recv
recvfrom=rb_w32_recvfrom
rename=rb_w32_urename
rmdir=rb_w32_urmdir
select=rb_w32_select
send=rb_w32_send
sendto=rb_w32_sendto
setsockopt=rb_w32_setsockopt
shutdown=rb_w32_shutdown
socket=rb_w32_socket
stati128=rb_w32_ustati128
strcasecmp=msvcrt.stricmp
strerror=rb_w32_strerror
strncasecmp=msvcrt.strnicmp
times=rb_w32_times
unlink=rb_w32_uunlink
utime=rb_w32_uutime
utimensat=rb_w32_uutimensat
utimes=rb_w32_uutimes
write=rb_w32_write
Files
Updated by cfis (Charlie Savage) 5 months ago
- Description updated (diff)
Updated by cfis (Charlie Savage) 5 months ago
- Description updated (diff)
Updated by cfis (Charlie Savage) 5 months ago
- Description updated (diff)
Updated by alanwu (Alan Wu) 3 months ago
- Assignee set to windows
Updated by alanwu (Alan Wu) 13 days ago
· Edited
Recent C runtimes on Windows have C99 compliant fclose(), and I think overriding it and breaks otherwise standard-compliant extensions. Removing the fclose() override passes CI, at least. https://github.com/ruby/ruby/pull/15073
@usa (Usaku NAKAMURA) What do you think?
Updated by cfis (Charlie Savage) 13 days ago
· Edited
Thanks @alanwu (Alan Wu). I added comments to the commit in Github, but copying here:
This MR will result in the Ruby dll exporting the symbol _fclose instead of fclose. The exporting happens here:
Microsoft tends to like to put _ in front of their c function names. In this case, fclose is ok but _flcoseall would not be. See https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fclose-fcloseall?view=msvc-170.
So perhaps that regex should be updated to not export functions that have __ in their name?
Or, should rb_w32_fclose be entirely removed? If standard fclose works on windows now, then maybe it no longer serves a useful purpose?
Updated by usa (Usaku NAKAMURA) 13 days ago
@alanwu (Alan Wu)
I'm still a bit concerned about compatibility with existing extension libraries, but with Ruby4 about to be released, it might be a good time to make the change. So, I'm not opposed to it this time.
Updated by cfis (Charlie Savage) 13 days ago
· Edited
The universal crt was introduced by Microsoft in 2015 (https://learn.microsoft.com/en-us/cpp/porting/upgrade-your-code-to-the-universal-crt?view=msvc-170) and Ruby for Windows has used it since Ruby 3.1 in 2021 (https://rubyinstaller.org/2021/12/31/rubyinstaller-3.1.0-1-released.html). According to Microsoft it "conforms closely to the ISO C99 standard".
At this point in time, could Ruby drop some of its overrides on Windows. The UCRT function list is here:
The ones the UCRT implements that Ruby overrides are:
access=rb_w32_uaccess
close=rb_w32_close
dup2=rb_w32_dup2
fclose=rb_w32_fclose
fstat=rb_w32_fstati128 (_fstat)
get_osfhandle=rb_w32_get_osfhandle (_get_osfhandle)
getpid=rb_w32_getpid
isatty=rb_w32_isatty
lseek=rb_w32_lseek
mkdir=rb_w32_umkdir
pipe=rb_w32_pipe (_pipe)
read=rb_w32_read
rename=rb_w32_urename
rmdir=rb_w32_urmdir
strerror=rb_w32_strerror
unlink=rb_w32_uunlink
utime=rb_w32_uutime (_utime)
write=rb_w32_write
If there is interest, I can remove these overrides locally and see if Ruby still builds and what happens with tests.
Updated by nobu (Nobuyoshi Nakada) 13 days ago
I can't the exact cause how the problem occurs, maybe passing FILE* opened in other than Ruby to rb_w32_fclose?
usa (Usaku NAKAMURA) wrote in #note-7:
I'm still a bit concerned about compatibility with existing extension libraries, but with Ruby4 about to be released, it might be a good time to make the change. So, I'm not opposed to it this time.
Although it's questionable about that winsock functions and MSVC functions for file descriptors are incompatible, I'll not be against.
diff --git a/win32/mkexports.rb b/win32/mkexports.rb
index 1a9f474be28..734de9b8dd1 100755
--- a/win32/mkexports.rb
+++ b/win32/mkexports.rb
@@ -40,26 +40,11 @@
syms[internal] = export
winapis[$1] = internal if /^_?(rb_w32_\w+)(?:@\d+)?$/ =~ internal
end
- incdir = File.join(File.dirname(File.dirname(__FILE__)), "include/ruby")
- read_substitution(incdir+"/win32.h", syms, winapis)
- read_substitution(incdir+"/subst.h", syms, winapis)
syms["rb_w32_vsnprintf"] ||= "ruby_vsnprintf"
syms["rb_w32_snprintf"] ||= "ruby_snprintf"
@syms = syms
end
- def read_substitution(header, syms, winapis)
- File.foreach(header) do |line|
- if /^#define (\w+)\((.*?)\)\s+(?:\(void\))?(rb_w32_\w+)\((.*?)\)\s*$/ =~ line and
- $2.delete(" ") == $4.delete(" ")
- export, internal = $1, $3
- if syms[internal] or internal = winapis[internal]
- syms[forwarding(internal, export)] = internal
- end
- end
- end
- end
-
def exports(name = $name, library = $library, description = $description)
exports = []
if name
Updated by cfis (Charlie Savage) 13 days ago
In my case I created a Ruby extension based on https://github.com/baderouaich/BitmapPlusPlus. Every time I tried to save a bitmap though it was corrupted. I submitted a bug to the project - https://github.com/baderouaich/BitmapPlusPlus/issues/6.
But I later realized the real problem is Ruby overrides fclose:
- bitmap library uses
std::ofstreamto write the file - ofstream eventually calls
fclose
But instead of calling fclose in UCRT, it calls Ruby's version of fclose. That happens because the Ruby dll exports fclose as a symbol and that overrides the one exported from the C runtime library (UCRT). I know this for a fact because I stepped through the code using a debugger. I can show you a screenshot of that if you would like to see it yourself.
Ruby's version of fclose (rb32_fclose) is not compatible with the UCRT version, and thus the file ends up corrupted.
Updated by cfis (Charlie Savage) 13 days ago
· Edited
I have attached a screenshot showing the problem. Notice in the stack trace on the bottom of the page that the method bmp::Bitmap::save(const std::filesystem::path & filename) calls fclose but ends up calling the Ruby version! Not the UCRT version.
Updated by nobu (Nobuyoshi Nakada) 9 days ago
cfis (Charlie Savage) wrote in #note-11:
Notice in the stack trace on the bottom of the page that the method
bmp::Bitmap::save(const std::filesystem::path & filename)calls fclose but ends up calling the Ruby version! Not the UCRT version.
rb_w32_fclose also calls fclose in the UCRT.
Updated by cfis (Charlie Savage) 8 days ago
Yes, but if the call goes through rb_w32_fclose the saved file is corrupted/invalid. If the call does not go through rb_w32_fclose then it is valid.
By overriding UCRT calls, Ruby breaks them for other libraries that use them.
Updated by mame (Yusuke Endoh) 6 days ago
cfis (Charlie Savage) wrote in #note-13:
if the call goes through
rb_w32_fclosethe saved file is corrupted/invalid.
@nobu (Nobuyoshi Nakada) is curious why it's getting corrupted, given that rb_w32_fclose just delegates to the original fclose when !is_socket(sock) is true. In that case, it shouldn't be corrupted.
Updated by cfis (Charlie Savage) 5 days ago
When I step through with a debugger, this line forces an early return from the function. So fclose is never called.
https://github.com/ruby/ruby/blob/master/win32/win32.c#L6627
if (fflush(fp)) return -1;
Updated by cfis (Charlie Savage) 5 days ago
· Edited
- File primitives.bmp primitives.bmp added
- File primitives_bad.bmp primitives_bad.bmp added
I have attached the "good" file and the "bad" file. The good file is generated by rewriting the extension to not use fclose, and thus not have the call go through rb_w32_fclose. The bad file is generated by using flcose which does end up calling rb_w32_fclose .
The good one is 372,736 bytes and the bad one is 368,640 bytes. Also interesting is your browser can probably render both. But if you save the file to disk and try to read it using paint.net, windows photos, etc it fails. Maybe browser are more lenient somehow?
Updated by cfis (Charlie Savage) 5 days ago
· Edited
I gave a try to @nobu's code above (https://bugs.ruby-lang.org/issues/21498#note-9).
That results in a linker error:
linking shared-object -test-/gvl/call_without_gvl.so
Creating library call_without_gvl-x64-mswin64_140.lib and object call_without_gvl-x64-mswin64_140.exp
call_without_gvl.obj : error LNK2019: unresolved external symbol __imp_select referenced in function do_loop
..\..\..\..\.ext\x64-mswin64_140\-test-\gvl\call_without_gvl.so : fatal error LNK1120: 1 unresolved externals
NMAKE : fatal error U1077: '@ cl -nologo -LD -Fe../../../../.ext/x64-mswin64_140/-test-/gvl/call_without_gvl.so call_without_gvl.obj x64-vcruntime140-ruby340.lib user32.lib -link -incremental:no -debug -opt:ref -opt:icf -manifest:embed,ID=2 -incremental:no -debug -opt:ref -opt:icf -manifest:embed,ID=2 -dll -libpath:. -libpath:../../../.. -implib:call_without_gvl-x64-mswin64_140.lib -pdb:call_without_gvl-x64-mswin64_140.pdb -def:call_without_gvl-x64-mswin64_140.def' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '@cd ext/-test-/gvl/call_without_gvl && "C:\Program Files\Microsoft Visual Studio\18\Insiders\VC\Tools\MSVC\14.50.35717\bin\HostX64\x64\nmake.EXE" -L V=0 all' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '@nmake -f exts.mk -l libdir="c:\msys64\usr\local\ruby-3.4.7-mswin/lib" LIBRUBY_EXTS=./.libruby-with-ext.time EXTENCS="dmyenc.obj" BASERUBY="C:\msys64\usr\local\ruby\bin\ruby.exe" MINIRUBY=".\miniruby.exe -I./lib -I. " ' : return code '0x2'
Stop.
Which is due to select not being part of the UCRT.
What if Ruby did not export only the functions that overlap with UCRT on Windows - the ones listed in this comment -
https://bugs.ruby-lang.org/issues/21498#note-8
Updated by mame (Yusuke Endoh) 4 days ago
cfis (Charlie Savage) wrote in #note-15:
When I step through with a debugger, this line forces an early return from the function. So fclose is never called.
https://github.com/ruby/ruby/blob/master/win32/win32.c#L6627
if (fflush(fp)) return -1;
Thank you for debugging. The early return (not calling fclose when fflush fails) seems problematic in terms of resource leaks. However, I don't think the leak itself is related to your issue. The real question is why fflush is failing.
@nobu (Nobuyoshi Nakada) said in the dev meeting that he's not opposed to removing the override, but he doesn't want to do it just because 'the problem went away.' He wants to understand the mechanism why the override doesn't work well.
cfis (Charlie Savage) wrote in #note-16:
The good one is 372,736 bytes and the bad one is 368,640 bytes. Also interesting is your browser can probably render both.
Looking closely, the bad one is missing data of the last line (i.e., the topmost line of the bmp). It does indeed seem to have failed flushing.
By the way, calling fclose while buffered data remains is somewhat risky. fclose flushes the buffer before closing, but it doesn't specify what happens to the data if the flush fails. It's likely discarded.
Updated by cfis (Charlie Savage) 4 days ago
· Edited
Thank you @mame (Yusuke Endoh) an @nobu (Nobuyoshi Nakada) for looking into this.
@mame (Yusuke Endoh) I don't know why the override doesn't work. The calling C++ code is from Microsoft Visual Studio\18\Insiders\VC\Tools\MSVC\14.50.35717\include\__msvc_filebuf.hpp:
_EXPORT_STD template <class _Elem, class _Traits>
class basic_filebuf : public basic_streambuf<_Elem, _Traits> { // stream buffer associated with a C stream
public:
...
basic_filebuf* close() {
basic_filebuf* _Ans;
if (_Myfile) { // put any homing sequence and close file
_Reset_back(); // revert from _Mychar buffer
_Ans = this;
if (!_Endwrite()) {
_Ans = nullptr;
}
if (_CSTD fclose(_Myfile) != 0) {
_Ans = nullptr;
}
} else {
_Ans = nullptr;
}
_Init(nullptr, _Closefl);
return _Ans;
}
}
When ruby overrides fclose, then (_CSTD fclose(_Myfile) != 0) is -1. When Ruby does not override it is 0.
As an experiment I created this MR - https://github.com/ruby/ruby/pull/15205. It removes overrides for UCRT functions. I then built Ruby 3.4.7 with it. With this patch, and as expected, the generated bitmap is correct. Note I have not run the full Ruby test suite yet (although it seems github is doing that?), and I have not tested a mingw64 build. I am happy to do so.
I do think this proves beyond any doubt that Ruby's override of fclose breaks it. It seems likely the same is true for other overrides.
Another thing to think about - Ruby and my extension are using the dynamic library version of ucrt. If either were to use the static library version, then I believe the override would fail because Ruby and the extension would have their own copies of the C runtime library. Thus a file handle opened by the extension could not be closed by Ruby, which is what I think would happen.
Updated by YO4 (Yoshinao Muramatsu) 4 days ago
If rb_win32_fclose calls ucrt's fflush/fclose, it's questionable why it doesn't work. Is it truly calling the same ucrts?
The screenshot suggests the C++ code is built in Debug build. From the C++ code, the the ucrt debug DLL might be linked.
On the other hand, if ruby isn't specially built, ruby's DLL will likely call the release version of the ucrt.
Could you verify which DLL's fopen/_wfopen (probably called from std::ofstream's constructor) and fflush/fclose (called from rb_win32_fclose) are being used?
Of course, if ucrt debug and release builds are compatible, this wouldn't matter for this issue.
I apologize, but I have no knowledge regarding their compatibility.
Updated by cfis (Charlie Savage) 3 days ago
· Edited
Ah yes, you are correct @YO4.
I used MSVC 2026 to build both Ruby and the extension (a dll). By default Ruby builds with /Md. This is the generated Makefile.sub in the win32 directory:
!if !defined(RUNTIMEFLAG)
RUNTIMEFLAG = -MD
!endif
When the extension is built in debug mode it will link with /MDd and in release mode /MD. I tested using a debug build and release build for the extension. The debug build results in a bad bitmap file and release build in a good one.
So this is the same issue I brought up about mixing runtime libraries (I used the example of static libraries). By overriding fclose, Ruby is a causing a file descriptor created by the debug UCRT in the extension to be closed by the release URCT in Ruby.
Microsoft specifically discusses mixing UCRT versions here and here:
They even have example bad code showing the use of fopen from one UCRT and fclose in another (this is essentially what Ruby is causing):
Those links say it is ok for executables and different extensions to use different versions of the UCRT but you have to be very careful to not pass memory allocations or CRT objects across UCRT boundaries. This is similar to building Ruby extensions before UCRT came out in 2015 (if you used Ruby to allocate memory you would have to call xfree, etc).
By overriding fclose and other ucrt functions, Ruby causes UCRT objects to be incorrectly passed between UCRT versions. Which gets back to the bug - Ruby should not do this. I believe my MR fixes this - see https://github.com/ruby/ruby/pull/15205.
As to your other question, the file is opened to by a call to _Fiopen
basic_filebuf* open(const wchar_t* _Filename, ios_base::openmode _Mode, int _Prot = ios_base::_Default_open_prot) {
// in standard as const std::filesystem::path::value_type *; _Prot is an extension
if (_Myfile) {
return nullptr;
}
const auto _File = _Fiopen(_Filename, _Mode, _Prot);
if (!_File) {
return nullptr; // open failed
}
_Init(_File, _Openfl);
_Initcvt(_STD use_facet<_Cvt>(_Mysb::getloc()));
return this; // open succeeded
}
Where _fiopen is referenced like this:
extern "C++" _CRTIMP2_PURE FILE* __CLRCALL_PURE_OR_CDECL _Fiopen(const wchar_t*, ios_base::openmode, int);
Summary - don't pass memory/crt objects across boundaries. By overriding fclose and other UCRT functions Ruby causes this to happen.
Updated by nobu (Nobuyoshi Nakada) 3 days ago
· Edited
cfis (Charlie Savage) wrote in #note-21:
Summary - don't pass memory/crt objects across boundaries. By overriding
fcloseand other UCRT functions Ruby causes this to happen.
Yes for the former half.
It is the reason we embed the CRT DLL version in the ruby DLL.
And "the CRT DLL version" includes release/debug, and the command line option is stored in RbConfig::CFLAGS.
I think we should check the CRT DLL version at loading time too.
Updated by cfis (Charlie Savage) 3 days ago
@nobu (Nobuyoshi Nakada) - When developing an extension it can be very helpful to use a debug build. A runtime check would prohibit that, unless you also make a debug version of ruby. But that seems like a lot of unnecessary overhead.
In addition, it would not allow a developer to statically link an extension to the UCRT (maybe there is no reason to ever do that?).
From what I can see, not exporting UCRT functions from the Ruby dll fixes the issue. Also I see that tests passed on github.
Do you worry that the proposed change might break 3rd party extensions? Or do you think it would result in the opposite problem, UCRT objects created in Ruby but then handled in an extension? Or is there another issue that worries you?
Thanks - Charlie
Updated by mame (Yusuke Endoh) 3 days ago
I commend YO4-san for their insight into the debug/release build discrepancy.
I'm now convinced the root cause is loading multiple runtimes. Therefore, I agree with @nobu's fix: we should prohibit loading multiple runtime versions, not just stop the override.
Mixing build types (e.g., debug extensions on a release core) can cause subtle failures in C APIs like FILE *rb_io_stdio_file(rb_io_t *fptr);. This risk alone justifies the prohibition.
Furthermore, Microsoft's documentation implies using a single CRT version is safer:
It's also possible to avoid some of these issues if all of the images in your process use the same dynamically loaded version of the CRT.
This doesn't read like a recommendation to load multiple versions.
While not mixing builds may seem inconvenient, this issue itself demonstrates the trouble it causes. For your case, building the Ruby core in debug mode is a necessary overhead, not an "unnecessary overhead" you said, I think.
Updated by cfis (Charlie Savage) 3 days ago
· Edited
I agree having one UCRT is best. But the very first sentence in the paragraph I linked to says:
Every executable image (EXE or DLL) can have its own statically linked CRT, or can dynamically link to a CRT. The version of the CRT statically included in or dynamically loaded by a particular image depends on the version of the tools and libraries it was built with. A single process may load multiple EXE and DLL images, each with its own CRT.
The point is that it is possible for this to happen.
So let me ask a different question. What purpose does overriding UCRT functions serve today (by overriding I mean exporting them via a DEF file).
Remember, if you go back to one of the earlier reports about this, a C library developer from Microsoft said what Ruby is doing causes undefined behavior - see https://github.com/NREL/OpenStudio/issues/3942#issuecomment-611072774. (fyi this is his github page - https://github.com/billyoneal).
I interpret this as Microsoft has clearly stated not to do what Ruby is doing. So why continue doing it?
Updated by mame (Yusuke Endoh) 3 days ago
I feel your communication on this ticket suffers from what is known as the XY problem. You are focusing on your proposed solution ("remove the override"), without first detailing the original problem and providing steps to reproduce it. This is evident even from the ticket's title. Unfortunately, this style of discussion is well-known to be unproductive.
To be clear, I (and probably @nobu (Nobuyoshi Nakada)) am not necessarily opposed to removing the override. In fact, I was initially so sympathetic to the idea that I was even lightly trying to persuade Nobu to move in that direction. However, now that it's clear the override wasn't the direct cause of the problem you are facing, I honestly find it difficult to take that argument seriously. I certainly no longer feel motivated to persuade Nobu about this.
Updated by cfis (Charlie Savage) 2 days ago
· Edited
Hi @mame (Yusuke Endoh) - Thank you for your response and I am sorry that you feel that way, that was not my intention. I very much appreciate your help and our discussion. Maybe we can try again?
I see the problem is that Ruby exports overridden C runtime calls. This is a problem because there is no guarantee an extension will have the same UCRT as Ruby. That causes UCRT objects to be shared incorrectly between UCRT libraries.
I am not sure why Ruby does this. I would find it very helpful if someone could explain that. Perhaps it is used to be necessary in the past, but no longer is?
I know that a Microsoft employee has stated in the past its a bad idea for Ruby to do this. I also know that other people have ran into this issue in the past - I provided links in the original post.
Does that help? Do we agree so far? Is there more information I can share? I am happy to push up my example code that I have been testing to github if you would like to see it.
I am of course open to any solution. I personally think the runtime check is a band-aid that doesn't fix the underlying problem.
Anyway, I very much would like to get this solved in some manner since it has been a longstanding issue.
Thank you for your help!
Updated by YO4 (Yoshinao Muramatsu) 2 days ago
Ruby has made significant efforts to achieve cross-platform support at the C level.
This includes compensating for the lack of C runtime libraries.
The statement you quoted is taken out of context and that seems unfair.
The override of fclose also stems from Microsoft's past decision not to integrate the socket library into low-level I/O.
While Microsoft has recently begun focusing on the CLI environment, it has historically prioritized the Win32 API over investing in the C library.
For example, time zone names were influenced by the OS language settings rather than the C locale, failing to behave as expected.
This required additional support in Ruby. #21144
A cooperative approach is preferable to criticism.
I like the process at https://github.com/oneclick/rubyinstaller2/issues/308.
The root cause of this issue lies in the coexistence of C runtime DLLs. Will your proposal solve it?
Even if the override of fclose were removed, who could guarantee the integrity of globally shared resources?
errno, locale, and Runtime support for _beginthread()—would these function correctly? Can someone guarantee it?
Alternatively, if a mechanism like LD_PRELOAD is available (side-by-side/manifest?), forcing the loading of a C runtime DLL might work around the issue.
However, the static-linked startup is present, so it's questionable whether this would work.
Therefore, its possibility depends on Microsoft's ecosystem.
In any case, the responsibility for ensuring the singleness of the runtime DLL lies with the user.
It seems that performing the runtime check on Ruby's side is the maximum that can be done.
Are you mistaking someone else's opinion for your own?
Even if it's only a band-aid, we should have been able to identify the root cause of this problem earlier.
Your cooperation with the investigation paved the way for that. That is undoubtedly an achievement of this issue.
One more thing...
Next time you encounter a similar situation, you can say it might be due to mixed C runtime DLLs.
Cheers! :wink:
Updated by cfis (Charlie Savage) 1 day ago
Hi @YO4 - thank you for your reply. I absolutely agree a cooperative approach is best. I am saddened / quite surprised to hear that you think I have criticized someone. I certainly had no intention to do that. If I did then I apologize.
I have used Ruby on Windows for almost 20 years now and would like to help make Ruby even better. In that spirit, let me move onto your technical questions.
I agree with your root cause statement. The global shared resources are global per UCRT instance. They are safe as long as objects don't pass between different UCRTs. Can I guarantee removing the overrides will fix all problems - no. But it does at least fix the problems introduced by the UCRT functions that are overridden.
I am guessing the reason to export these overridden functions from the Ruby library (versus just keeping them internal) is to use them in Ruby extensions. Thus if a UCRT object, like a file handle, is created by the Ruby executable then an extension could use it. But does that actually happen? Wouldn't a Ruby extension instead use the offical rb_* APIs thus avoiding any issues?
One idea - I can make a debug build of Ruby and a release build of my extension to see if the debug version of UCRT recognizes the file handle from fclose is from a different UCRT version. If it can, then we could run the Ruby test suite with a debug build of Ruby and release builds of built-in extensions and see what errors get generated. Not sure this will work, but I will give it a try.
I that works, then perhaps we can feel confident this change won't break anything. I do think removing the overrides is the right thing to do, especially if they are not actually needed (I am encouraged that the github tests passed, that at least seems to indicate they are not). I do agree that having a runtime check as an additional safety measure is a good idea too.
I prefer is this is not the user's problem though. This is a difficult problem to recognize (excellent work that you did), and the user doesn't have a lot of recourse but go rebuild everything themselves which is much to hard for the average user.
Updated by YO4 (Yoshinao Muramatsu) 1 day ago
The discussion you cited seemed to me to be ignoring the actual problem by pinning the cause solely on ruby's behavior.
You used it to solidify your argument within the ruby community, and even tried to borrow Microsoft's authority to do so.
That's how I felt about it. No need to apologize, but we must always keep learning.
I still don't understand how calls to overridden functions that are using the original routine could cause problems in a healthy setup.
What do you mean 'the problems introduced by the ucrt functions that are overridden'?
I don't know what the difficulties faced by the OpenStudio people actually stemmed from, since they haven't provided that information.
However, I can examine the OpenStudio-3.1 binaries of that time.
> objdump -p openstudio.so | grep "DLL Name"
DLL Name: openstudiolib.dll
DLL Name: x64-msvcrt-ruby250.dll
DLL Name: USER32.dll
DLL Name: KERNEL32.dll
DLL Name: MSVCP140.dll
DLL Name: VCRUNTIME140.dll
DLL Name: VCRUNTIME140_1.dll
DLL Name: api-ms-win-crt-runtime-l1-1-0.dll
DLL Name: api-ms-win-crt-stdio-l1-1-0.dll
DLL Name: api-ms-win-crt-heap-l1-1-0.dll
DLL Name: api-ms-win-crt-string-l1-1-0.dll
DLL Name: api-ms-win-crt-time-l1-1-0.dll
I should note that even back then, it was possible to build ruby using ucrt with msvc.
Lets move on to the technical details.
I've already mentioned that ruby has C-level cross-platform support.
For example, winsock library lacks support for errno, but ruby's wrapper makes it usable.
File and socket fds can be handled uniformly on other operating systems, but on windows, you must use _close() and closesocket() appropriately depending on the target.
This is what ruby is currently tackling.
Should we allow selecting whether to include overrides during build time?
The moment we do, we'll be flooded with reports of mysterious errors here and to gem developers. Maybe Windows-specific error reports will flood useful gem developers using other OSes, even though their code is correct.
Furthermore, setup differences could make troubleshooting difficult. I don't want to see a future where gem developers come to hate the Windows platform.
If you continue to propose overriding deletions, I believe you need to provide an alternative solution.
If removing the override made the test pass, then the test is likely insufficient.
As mentioned above, it works fine on other operating systems. They don't need that test.
It's no secret that the Windows version has platform-specific issues and relies on community contributions. We can contribute as well.
The debug version of the Microsoft C runtime library is not redistributable. That is to say, the user is the developer.
What I meant by "user" is that you are responsible for matching the C runtime DLL in this case.
I don't want to burden the ruby core team any further.
My apologies, but the basis for your argument in this issue has been lost, and I believe continuing discussion here would be unproductive.
If you have a better solutions to what ruby is currently doing, or if you have made progress in your own efforts, they would be welcome.
I apologize for the lengthy message.
Regards
Updated by YO4 (Yoshinao Muramatsu) about 7 hours ago
Looking back now, I realize my recent comments were not good communication.
My comments were overly aggressive.
Particularly noteworthy the final statement was made in a manner that seemed to represent the community, cutting off @cifs' subsequent posts. This was an attempt to borrow authority and exclude cooperation.
This is precisely the kind of poor communication I pointed out.
I apologize to Mr. Savage and the community for the manner of my recent comments.
I will be mindful of my imperfections and strive to provide better communication.
Updated by cfis (Charlie Savage) about 4 hours ago
Hi YO4 - thank you for your last comment - I appreciate it. I too need to strive for better communication, because clearly I am not communicating very well. I hope we can all assume the best intentions in others, and realize communication, especially online, in tickets and with cultural differences is difficult.
I think your comment shows where we are not communicating:
I still don't understand how calls to overridden functions that are using the original routine could cause problems in a healthy setup.
If the Ruby shared library on windows did not include this line in its DEF file - fclose=rb_w32_fclose - then the extension would use fclose from its debug version of the UCRT and not Ruby's release build of UCRT. Thus the problem is solved, at least in this one case.
Second, by "healthy setup" I think you mean that Ruby and all extensions share the same UCRT. I think that is the best case, and likely true 99% of the time, but I am trying to explore if that restriction is absolutely necessary. Because in real life sometimes UCRT's are mismatched (static builds with UCRT, debug builds, before UCRT different C runtimes per MSVC version).
Which is why I keep coming back to the same question - does Ruby still need to export (or as I have said above overrides) these function today?
Ruby has done amazing work providing a consistent API across platforms, and windows is quite different that other operating systems. Your example of networking and winsock are excellent ones - those clearly are needed to be standardized across operating systems and it makes sense for Ruby to for example export select. So I am only discussing the exports that override URCT:
access=rb_w32_uaccess
close=rb_w32_close
dup2=rb_w32_dup2
fclose=rb_w32_fclose
fstat=rb_w32_fstati128 (_fstat)
get_osfhandle=rb_w32_get_osfhandle (_get_osfhandle)
getpid=rb_w32_getpid
isatty=rb_w32_isatty
lseek=rb_w32_lseek
mkdir=rb_w32_umkdir
pipe=rb_w32_pipe (_pipe)
read=rb_w32_read
rename=rb_w32_urename
rmdir=rb_w32_urmdir
strerror=rb_w32_strerror
unlink=rb_w32_uunlink
I hope that clarifies why I think this is an issue worth of discussion and resolution? I did propose a concrete path forward that I hoped would address your concerns (see if we can use a debug build to catch these errors). Do you think that is worth pursuing?