Bug #21267
openrespond_to check in Class#allocate is inconsistent
Description
Class#allocate
has an additional rb_obj_respond_to(klass, rb_intern("allocate"))
check to forbid allocate being called on a class where it has been made private or undefined, even if used via ex. bind_call
.
>> Rational.allocate
(irb):1:in '<main>': undefined method 'allocate' for class Rational (NoMethodError)
>> Class.instance_method(:allocate).bind_call(Rational)
(irb):1:in 'Class#allocate': calling Rational.allocate is prohibited (TypeError)
However I don't think this provides any additional protection from users accessing an uninitialized object, as the user can redefine allocate to anything to bypass the check:
>> Class.instance_method(:allocate).bind_call(Class.new(Rational) {def self.allocate; end})
=> (0/1)
Or even override respond_to_missing?
>> Class.instance_method(:allocate).bind_call(Class.new(Rational) {def self.respond_to_missing? *; true; end})
=> (0/1)
So I think we should remove this check. For classes that we need to forbid allocation we should use rb_undef_alloc_func
.
The classes I see this used for are:
- MatchData
- Refinement
- Module
- Complex
- Rational
My main motivation is that this check makes Class#allocate
slow. There are ways we could improve that, but I don't think the check as-is is useful. If there's an alternative, more robust, check we'd like to do instead of simply removing this I'd be happy to implement it.
This makes allocate
~75% faster.
|allocate_no_params | 19.009M| 20.087M|
| | -| 1.06x|
|allocate_allocate | 20.587M| 35.882M|
| | -| 1.74x|
Updated by jhawthorn (John Hawthorn) 4 days ago
At RubyKaigi @byroot (Jean Boussier) and I looked at this and came up with a more efficient and resilient way to do the check (https://github.com/ruby/ruby/commit/a6365cef1084191f992ee55fa661cb9aa2196c39). But coming back to this I again don't think it's effective enough to prevent uninitialized values and we should just remove the check.
Again, this check is trying to avoid users being able to make an uninitialized value of these classes, even if they are trying very hard
>> Class.instance_method(:allocate).bind_call(MatchData)
(irb):6:in 'Class#allocate': calling MatchData.allocate is prohibited (TypeError)
However we aren't protecting new
with this check, and can get uninitialized values that way (even though it has been undefined in the same way as allocate
):
>> Class.instance_method(:new).bind_call(MatchData)
An error occurred when inspecting the object: #<TypeError: uninitialized MatchData>
Result of Kernel#inspect: #<MatchData:0x00007387517150b0>
We thought the reason for this special check rather than using rb_undef_alloc_func
is so that dup
/clone
can still work, but that itself is problematic and a way users can get an uninitialized value.
>> match = "a".match(/./)
=> #<MatchData "a">
>> match.clone # wanting this to work is why we don't rb_undef_alloc_func (I think)
=> #<MatchData "a">
>> def match.initialize_copy(x); end
=> :initialize_copy
>> match.clone # now this makes an uninitialized object
An error occurred when inspecting the object: #<TypeError: uninitialized MatchData>
Result of Kernel#inspect: #<MatchData:0x0000738752cfb370>
So I am back to believing we should remove the rb_obj_respond_to
check, because it doesn't provide any safety. It seems like our only options are to tolerate uninitialized objects or use rb_undef_alloc_func.
Updated by Eregon (Benoit Daloze) 3 days ago
I think it makes sense to replace or remove that "incorrect" check given it doesn't apply to new
.
The proper fix seems to also do the correct check in new
.
I think it's best to keep the possibility for 3rd party C extensions to undefine allocate
(to prevent creating uninitialized objects) and still allow dup/clone (which should work fine if people don't monkey-patch initialize_{copy,clone,dup}
, if they do the segfault/error is their responsibility then).
With this change there is no protection in place left, so Class.instance_method(:allocate).bind_call(MyType)
could segfault, even if MyType undefines allocate
.
Looking at the PR, how about setting alloc_prohibited = true
when undefining allocate
? (and set it back to false if it's defined again later)
Another way would be to have separate allocation paths for "new object" and "copy" but this requires much more thought.
Or prevent this case of "no new/allocate but dup/clone", but then for core types too for consistency (maybe too incompatible? maybe not?).
Or core types could actually define dup & clone as overridden methods, and not use the alloc function at all and so use rb_undef_alloc_func()
.
That seems the cleanest actually, WDYT?
it doesn't provide any safety
Well, using bind_call is not rare and might even do it accidentally, OTOH defining a singleton initialize_copy
/allocate
method is pretty hardcore, and that should be no surprise it blows things up.