Feature #15923
closedNew independent string without memcpy
Description
Hello. I've just tried to implement extension for ruby that will provide large binary strings.
I've inspected latest ruby source code and found 2 functions: rb_str_new and rb_str_new_static .
- rb_str_new allocates new memory and uses memcpy to copy from source string to new memory.
- rb_str_new_static uses existing source string as it is, but adds STR_NOFREE flag.
Is it possible to create independent string from source string without memcpy that will be freed automatically? Thank you.
        
           Updated by shyouhei (Shyouhei Urabe) over 6 years ago
          Updated by shyouhei (Shyouhei Urabe) over 6 years ago
          
          
        
        
      
      puchuu (Andrew Aladjev) wrote:
Is it possible to create independent string from source string without memcpy that will be freed automatically?
In C there are several ways to free a memory region, depending how that string was allocated.
"Every string must be able to be freed using free()" is simply a wrong assertion.
So no, there is no way for ruby to automatically free a memory allocated by others.
C is not made that way.
        
           Updated by luke-gru (Luke Gruber) over 6 years ago
          Updated by luke-gru (Luke Gruber) over 6 years ago
          
          
        
        
      
      I think what puchuu is asking is if he can pass a malloc'd string to a ruby function that will create a new string object that frees the given underlying buffer when the string object is destructed. Having read the code, I didn't come upon such a case but I imagine it's possible with a slight hack (untested by me, however):
VALUE str = rb_str_new_static(buffer, buflen); /* no malloc or memcpy done here, just ownership change of buffer */
RUBY_FL_UNSET(str, STR_NOFREE); /* STR_NOFREE isn't actually defined in internal.h unfortunately, it's currently same as FL_USER18, but could change. */
Perhaps a new ruby string creation function would be useful? Something like rb_str_new_take(). Just a thought.
Of course the allocator used to allocate the buffer would have to be the same as Ruby's allocator or bad things will happen...
        
           Updated by nobu (Nobuyoshi Nakada) over 6 years ago
          Updated by nobu (Nobuyoshi Nakada) over 6 years ago
          
          
        
        
      
      ruby_xfree != free.
Using the former on malloc'ed buffer can cause a crash.
        
           Updated by luke-gru (Luke Gruber) over 6 years ago
          Updated by luke-gru (Luke Gruber) over 6 years ago
          
          
        
        
      
      Thank you Nobu, I thought that might be the case but was unaware as I'm not familiar with the GC subsystem. Also I think shyouhei was saying the same thing, I was just too dense to understand the specifics of what he was saying :)
Having taken a cursory look, it seems ruby is adding some bookkeeping information at the start of every memory buffer allocated by ruby_xmalloc and family. It returns the memory after this bookkeeping information (the actual buffer size asked for), and when this buffer is given to ruby_xfree, ruby calculates the actual starting point by moving backwards 1 bookkeeping structure, then passes this to free.
So, you would have to allocate using ruby_xmalloc and friends anyway, in which case it seems useless to provide such a function like rb_str_new_take.
        
           Updated by alanwu (Alan Wu) over 6 years ago
          Updated by alanwu (Alan Wu) over 6 years ago
          
          
        
        
      
      Instead of working on a separate buffer then asking Ruby to take ownership, you could make changes to the buffer of a string:
VALUE new_string = rb_str_new("", 0);
rb_str_resize(new_string, size_you_want);
do_work(RSTRING_PTR(new_string), RSTRING_LEN(new_string));
Would this be good enough?
        
           Updated by nobu (Nobuyoshi Nakada) over 6 years ago
          Updated by nobu (Nobuyoshi Nakada) over 6 years ago
          
          
        
        
      
      It should be OK when passing the buffer from callers, but doesn't work with a library which returns a buffer allocated inside.
FYI: you can allocate the buffer by rb_str_new(NULL, size_you_want) at once.
        
           Updated by puchuu (Andrew Aladjev) over 6 years ago
          Updated by puchuu (Andrew Aladjev) over 6 years ago
          
          
        
        
      
      nobu (Nobuyoshi Nakada) wrote:
It should be OK when passing the buffer from callers, but doesn't work with a library which returns a buffer allocated inside.
FYI: you can allocate the buffer by
rb_str_new(NULL, size_you_want)at once.
Thanks all, I see. Ruby has some kind of internal memory allocation mechanism and it is not recommended to use strings allocated outside.
Integration of rb_str_resize into buffer growth mechanism is a good but complex solution. I will keep string copy.
        
           Updated by ko1 (Koichi Sasada) about 6 years ago
          Updated by ko1 (Koichi Sasada) about 6 years ago
          
          
        
        
      
      - Status changed from Open to Rejected
I didn't all comments, but it seems solved.
Please reopen it if it is my mistake.
        
           Updated by puchuu (Andrew Aladjev) about 6 years ago
          Updated by puchuu (Andrew Aladjev) about 6 years ago
          
          
        
        
      
      I've implemented string bindings using growing ruby string. It was a bit tricky - I had to use rb_protect. I will leave here a link, so everyone can see an example. https://github.com/andrew-aladev/ruby-lzws/blob/master/ext/lzws_ext/string.c