Project

General

Profile

Feature #15923

New independent string without memcpy

Added by puchuu (Andrew Aladjev) 6 months ago. Updated 2 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:93142]

Description

Hello. I've just tried to implement extension for ruby that will provide large binary strings.

I've inspected latest ruby source code and found 2 functions: rb_str_new and rb_str_new_static .

  • rb_str_new allocates new memory and uses memcpy to copy from source string to new memory.
  • rb_str_new_static uses existing source string as it is, but adds STR_NOFREE flag.

Is it possible to create independent string from source string without memcpy that will be freed automatically? Thank you.

History

Updated by shyouhei (Shyouhei Urabe) 6 months ago

puchuu (Andrew Aladjev) wrote:

Is it possible to create independent string from source string without memcpy that will be freed automatically?

In C there are several ways to free a memory region, depending how that string was allocated.
"Every string must be able to be freed using free()" is simply a wrong assertion.

So no, there is no way for ruby to automatically free a memory allocated by others.
C is not made that way.

Updated by luke-gru (Luke Gruber) 6 months ago

I think what puchuu is asking is if he can pass a malloc'd string to a ruby function that will create a new string object that frees the given underlying buffer when the string object is destructed. Having read the code, I didn't come upon such a case but I imagine it's possible with a slight hack (untested by me, however):

VALUE str = rb_str_new_static(buffer, buflen); /* no malloc or memcpy done here, just ownership change of buffer */
RUBY_FL_UNSET(str, STR_NOFREE); /* STR_NOFREE isn't actually defined in internal.h unfortunately, it's currently same as FL_USER18, but could change. */

Perhaps a new ruby string creation function would be useful? Something like rb_str_new_take(). Just a thought.

Of course the allocator used to allocate the buffer would have to be the same as Ruby's allocator or bad things will happen...

Updated by nobu (Nobuyoshi Nakada) 6 months ago

ruby_xfree != free.
Using the former on malloc'ed buffer can cause a crash.

Updated by luke-gru (Luke Gruber) 6 months ago

Thank you Nobu, I thought that might be the case but was unaware as I'm not familiar with the GC subsystem. Also I think shyouhei was saying the same thing, I was just too dense to understand the specifics of what he was saying :)

Having taken a cursory look, it seems ruby is adding some bookkeeping information at the start of every memory buffer allocated by ruby_xmalloc and family. It returns the memory after this bookkeeping information (the actual buffer size asked for), and when this buffer is given to ruby_xfree, ruby calculates the actual starting point by moving backwards 1 bookkeeping structure, then passes this to free.

So, you would have to allocate using ruby_xmalloc and friends anyway, in which case it seems useless to provide such a function like rb_str_new_take.

Updated by alanwu (Alan Wu) 5 months ago

Instead of working on a separate buffer then asking Ruby to take ownership, you could make changes to the buffer of a string:

VALUE new_string = rb_str_new("", 0);
rb_str_resize(new_string, size_you_want);
do_work(RSTRING_PTR(new_string), RSTRING_LEN(new_string));

Would this be good enough?

Updated by nobu (Nobuyoshi Nakada) 5 months ago

It should be OK when passing the buffer from callers, but doesn't work with a library which returns a buffer allocated inside.

FYI: you can allocate the buffer by rb_str_new(NULL, size_you_want) at once.

Updated by puchuu (Andrew Aladjev) 5 months ago

nobu (Nobuyoshi Nakada) wrote:

It should be OK when passing the buffer from callers, but doesn't work with a library which returns a buffer allocated inside.

FYI: you can allocate the buffer by rb_str_new(NULL, size_you_want) at once.

Thanks all, I see. Ruby has some kind of internal memory allocation mechanism and it is not recommended to use strings allocated outside.

Integration of rb_str_resize into buffer growth mechanism is a good but complex solution. I will keep string copy.

Updated by ko1 (Koichi Sasada) 5 months ago

  • Status changed from Open to Rejected

I didn't all comments, but it seems solved.
Please reopen it if it is my mistake.

Updated by puchuu (Andrew Aladjev) 2 months ago

I've implemented string bindings using growing ruby string. It was a bit tricky - I had to use rb_protect. I will leave here a link, so everyone can see an example. https://github.com/andrew-aladev/ruby-lzws/blob/master/ext/lzws_ext/string.c

Also available in: Atom PDF