I think this is too hard to read and parse for a human and 5 arguments seems way too much for a core method.
It feels like a full memcpy/arraycopy which I don't think in general is a good idea for String.
The implementation complexity in []= and similar already hurts Ruby too much.
This is probably the 3rd or more workaround I see to have proper lazy substrings in CRuby, i.e., "abcdef"[1..3] must not copy bytes.
That is what needs to be solved (it already works in TruffleRuby).
Yes, it means RSTRING_PTR() might need to allocate to \0-terminate, so be it, it's worth it.
So I am strongly against this, it's a nth workaround for something simpler to solve which is much more helpful in general.
I agree that this is a workaround and a VM should solve this as an optimization.
But your proposal: Lazy substrings is not a solution because it also creates an object especially for small strings which is embedded in RVALUE.
I agree that this is memcpy/arraycopy.
Therefore this proposal should add a description how large this workaround contributes performance in such use cases as memcpy on Ruby.
But your proposal: Lazy substrings is not a solution because it also creates an object especially for small strings which is embedded in RVALUE.
Yes it creates a String instance reusing the same buffer.
That shouldn't cost much compared to copying many bytes.
It should be insignificant on a benchmark with a long string to copy/move, for a short string perf shouldn't matter much anyway (it won't the be bottleneck of the program).
If it's still too much overhead, it sounds like allocations in CRuby need to be better optimized, or escape analysis should be implemented.
Again, those 2 are more general and benefits are much wider than this one method change that would be used for very few Ruby programs and only handles one specific case.
Ah, something I missed though is that with lazy substrings, there would still need to be a copy of the bytes to "unshare" the string when writing to it.
That copy would also be needed if the string was shared before (e.g. with .dup), but that's unknown in our case.
This does depend on how sharing is implemented, maybe CRuby can see it's only String instances sharing that buffer, and actually both strings are involved in this operation and so there is only need to copy the bytes of the substring.
It feels like a full memcpy/arraycopy which I don't think in general is a good idea for String.
To expand on that, I dislike that because it's using String as a byte array.
If anything, such operation should be supported on Array before String.
I think there is no need to change String#bytesplice therefore (there is even not a need for String#bytesplice due to that, which I think we shouldn't have added).
And IO::Buffer seems better suited for byte-buffer-like operations.
In Feature #19314, we concluded that the return value of String#bytesplice
should be changed from the source string to the receiver, because the source
string is useless and confusing when extra arguments are added.
String#bytesplice should return self
In Feature #19314, we concluded that the return value of String#bytesplice
should be changed from the source string to the receiver, because the source
string is useless and confusing when extra arguments are added.
This change should be included in Ruby 3.2.1.
---
string.c | 4 ++--
test/ruby/test_string.rb | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)