New independent string without memcpy

Added by puchuu (Andrew Aladjev) about 1 month ago. Updated 6 days ago.

Hello. I've just tried to implement extension for ruby that will provide large binary strings.

I've inspected latest ruby source code and found 2 functions: rb_str_new and rb_str_new_static .

  • rb_str_new allocates new memory and uses memcpy to copy from source string to new memory.
  • rb_str_new_static uses existing source string as it is, but adds STR_NOFREE flag.

Is it possible to create independent string from source string without memcpy that will be freed automatically? Thank you.


Updated by shyouhei (Shyouhei Urabe) about 1 month ago

puchuu (Andrew Aladjev) wrote:

Is it possible to create independent string from source string without memcpy that will be freed automatically?

In C there are several ways to free a memory region, depending how that string was allocated.
"Every string must be able to be freed using free()" is simply a wrong assertion.

So no, there is no way for ruby to automatically free a memory allocated by others.
C is not made that way.

Updated by luke-gru (Luke Gruber) about 1 month ago

I think what puchuu is asking is if he can pass a malloc'd string to a ruby function that will create a new string object that frees the given underlying buffer when the string object is destructed. Having read the code, I didn't come upon such a case but I imagine it's possible with a slight hack (untested by me, however):

VALUE str = rb_str_new_static(buffer, buflen); /* no malloc or memcpy done here, just ownership change of buffer */
RUBY_FL_UNSET(str, STR_NOFREE); /* STR_NOFREE isn't actually defined in internal.h unfortunately, it's currently same as FL_USER18, but could change. */

Perhaps a new ruby string creation function would be useful? Something like rb_str_new_take(). Just a thought.

Of course the allocator used to allocate the buffer would have to be the same as Ruby's allocator or bad things will happen...

Updated by nobu (Nobuyoshi Nakada) 30 days ago

ruby_xfree != free.
Using the former on malloc'ed buffer can cause a crash.

Updated by luke-gru (Luke Gruber) 29 days ago

Thank you Nobu, I thought that might be the case but was unaware as I'm not familiar with the GC subsystem. Also I think shyouhei was saying the same thing, I was just too dense to understand the specifics of what he was saying :)

Having taken a cursory look, it seems ruby is adding some bookkeeping information at the start of every memory buffer allocated by ruby_xmalloc and family. It returns the memory after this bookkeeping information (the actual buffer size asked for), and when this buffer is given to ruby_xfree, ruby calculates the actual starting point by moving backwards 1 bookkeeping structure, then passes this to free.

So, you would have to allocate using ruby_xmalloc and friends anyway, in which case it seems useless to provide such a function like rb_str_new_take.

Updated by alanwu (Alan Wu) 6 days ago

Instead of working on a separate buffer then asking Ruby to take ownership, you could make changes to the buffer of a string:

VALUE new_string = rb_str_new("", 0);
rb_str_resize(new_string, size_you_want);
do_work(RSTRING_PTR(new_string), RSTRING_LEN(new_string));

Would this be good enough?

Updated by nobu (Nobuyoshi Nakada) 6 days ago

It should be OK when passing the buffer from callers, but doesn't work with a library which returns a buffer allocated inside.

FYI: you can allocate the buffer by rb_str_new(NULL, size_you_want) at once.

