Bug #20009
openMarshal.load raises exception when load dumped class include non-ASCII
Description
Reproduction code¶
class Cクラス; end
Marshal.load(Marshal.dump(Cクラス))
Actual result¶
<internal:marshal>:34:in `load': undefined class/module C\xE3\x82\xAF\xE3\x83\xA9\xE3\x82\xB9 (ArgumentError)
from marshal.rb:2:in `<main>'
Expected result¶
Returns Cクラス
Impacted area¶
An exception is raised in Rails under the following conditions
- minitest is used with default settings
- Parallel execution with parallelize
- test class names contain non-ASCII characters
The default parallelization uses DRb, and Marshal is used inside DRb.
Other¶
After trying various things, I thought I could fix it by making rb_path_to_class
support strings containing non-ASCII characters, but I couldn't find anything more than that.
Updated by byroot (Jean Boussier) about 1 year ago
I dug into this bug, and I'm not sure if it's possible to fix it.
Classes are serialized this way:
case T_CLASS:
w_byte(TYPE_CLASS, arg);
{
VALUE path = class2path(obj);
w_bytes(RSTRING_PTR(path), RSTRING_LEN(path), arg);
RB_GC_GUARD(path);
}
break;
We write the TYPE_CLASS
prefix, and then write the bytes of the class name, without any encoding indication.
Then on load
, we just read the bytes and try to lookup the class:
case TYPE_CLASS:
{
VALUE str = r_bytes(arg);
v = path2class(str);
So on load
we're looking for "Cクラス".b.to_sym
, which doesn't match :"Cクラス"
.
To fix this we'd need to include the encoding in the format, but that would mean breaking backward and forward compatibility which is a huge deal.
Half-way solution¶
Some possible half-way solution would be:
- Assume non-ASCII class names are UTF-8
- Raise on dump for class names with non-UTF8 compatible class names.
It's far from ideal though.