Bug #20998
closedrb_str_locktmp() changes flags of frozen strings and string literals
Description
# frozen_string_literal: true
# BOILERPLATE START
require 'tmpdir'
require 'rbconfig'
def inline_c_extension(c_code)
  Dir.mktmpdir('inline_c_extension') do |dir|
    File.write("#{dir}/cext.c", c_code)
    File.write("#{dir}/extconf.rb", <<~RUBY)
    require 'mkmf'
    create_makefile('cext')
    RUBY
    out = IO.popen([RbConfig.ruby, 'extconf.rb'], chdir: dir, &:read)
    raise "ruby extconf.rb failed: #{$?.inspect}\n#{out}" unless $?.success?
    out = IO.popen(['make'], chdir: dir, &:read)
    raise "make failed: #{$?.inspect}\n#{out}" unless $?.success?
    require "#{dir}/cext.#{RbConfig::CONFIG['DLEXT']}"
  end
end
inline_c_extension <<~C
#include "ruby.h"
static VALUE foo(VALUE self, VALUE str) {
  rb_str_locktmp(str);
  return str;
}
void Init_cext(void) {
  VALUE c = rb_define_class("Foo", rb_cObject);
  rb_define_singleton_method(c, "foo", foo, 1);
}
C
# BOILERPLATE END
a = "str"
Foo.foo(a)
# imagine a million lines of code in between
b = "str"
b << "."
# can't modify string; temporarily locked (RuntimeError)
# What? Who "locked" that immutable frozen string literal?
# It should be: can't modify frozen String: "str" (FrozenError)
Same problem with:
Foo.foo("abc")
# imagine a million lines of code in between
Foo.foo("abc") # temporal locking already locked string (RuntimeError)
Related: https://github.com/oracle/truffleruby/issues/3752
It seems a clear bug to mutate a frozen string (with visible side effects), even more so for shared frozen string literals.
I think rb_str_locktmp() should raise (a FrozenError, as it's effectively attempting to mutate it, same as calling rb_str_modify()) if called on a frozen string, because it makes little sense, I think "locking a string" only makes sense for mutable strings.
The alternative would be to noop on frozen strings for rb_str_locktmp() and rb_str_unlocktmp(), I think that's susprising, and potentially hiding more bugs, e.g. if one by mistake tries to mutate the RSTRING_PTR() or so, and also rb_str_locktmp() wouldn't imply the flag is set after it returns.
Any attempt to mutate a frozen string should fail, so might as well fail early.
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      - Description updated (diff)
        
           Updated by nobu (Nobuyoshi Nakada) 10 months ago
          Updated by nobu (Nobuyoshi Nakada) 10 months ago
          
          
        
        
      
      There isn’t a case that locking a string just to avoid being modified or moved, but not to mutate it?
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      I'm not aware of any such case, and .freeze seems much better for that use case.
        
           Updated by byroot (Jean Boussier) 10 months ago
          Updated by byroot (Jean Boussier) 10 months ago
          
          
        
        
      
      Actually yes, I think it's not an uncommon use case?
e.g. if you need to write a given string into a socket or something like that, you may want to lock it to ensure a concurrent thread won't modify it.
I think zlic.c uses it this way?
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          
          · Edited
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          · Edited
        
        
      
      @byroot (Jean Boussier) but that's mutating the string, @nobu (Nobuyoshi Nakada) is asking for a case not mutating the string, which I think there is none.
EDIT: I read too fast, "write a given string into a socket" does not imply mutating the string.
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      https://github.com/ruby/ruby/blob/975461bf885b8fb791cf048ad39c79924d28e4e9/ext/zlib/zlib.c#L1056 but I'm not quite sure how that String is used.
It seems clearly wrong that two threads using Zlib with a frozen String literal (which happens to have the same contents/characters for both threads) would fail with temporal locking already locked string (RuntimeError) though.
So what should we do then when rb_str_locktmp() is called on a frozen string?
        
           Updated by byroot (Jean Boussier) 10 months ago
          Updated by byroot (Jean Boussier) 10 months ago
          
          
        
        
      
      It seems clearly wrong that two threads using Zlib with a frozen String literal (which happens to have the same contents/characters for both threads) would fail with temporal locking already locked string (RuntimeError) though.
That I totally agree.
        
           Updated by Eregon (Benoit Daloze) 10 months ago
          Updated by Eregon (Benoit Daloze) 10 months ago
          
          
        
        
      
      @nobu (Nobuyoshi Nakada) @byroot (Jean Boussier) So do you think rb_str_locktmp() should noop on frozen strings (since they can't change anyway), or it should raise because the usage seems somewhat dubious?
I'm fine with either, I'd just like a decision so we can do the same in TruffleRuby.
(the current status is broken so that's worse than either fix)
        
           Updated by byroot (Jean Boussier) 10 months ago
          Updated by byroot (Jean Boussier) 10 months ago
          
          
        
        
      
      I prefer the noop solution, as it makes it a much easier to use API. But no strong opinion.
        
           Updated by mame (Yusuke Endoh) 9 months ago
          Updated by mame (Yusuke Endoh) 9 months ago
          
          
        
        
      
      Discussed at the dev meeting. @akr (Akira Tanaka) preferred raising an Exception. The current purpose of rb_str_locktmp is to prevent the RSTRING_PTR from being moved (by realloc, compaction, etc.) when passing the pointer to a C function that blocks and writes to the buffer (typically, read(2) syscall). In other words, rb_str_locktmp should be performed on a String that is about to be written. We couldn't find a use case for rb_str_locktmp on a frozen String. Unless there is a valid use case for that, it would be good to raise an Exception, instead of no-op.
        
           Updated by Eregon (Benoit Daloze) 5 months ago
          Updated by Eregon (Benoit Daloze) 5 months ago
          
          
        
        
      
      mame (Yusuke Endoh) wrote in #note-13:
Unless there is a valid use case for that, it would be good to raise an Exception, instead of no-op.
FYI TruffleRuby implemented FrozenError when rb_str_locktmp() is used on a frozen String in https://github.com/oracle/truffleruby/pull/3867.
It would be great if CRuby can do the same for 3.5.
        
           Updated by Eregon (Benoit Daloze) 4 months ago
          Updated by Eregon (Benoit Daloze) 4 months ago
          
          
        
        
      
      - Assignee set to Eregon (Benoit Daloze)
        
           Updated by ioquatix (Samuel Williams) 4 months ago
          Updated by ioquatix (Samuel Williams) 4 months ago
          
          
        
        
      
      My usage of rb_str_locktmp isn't about mutation, it's about making sure the pointer doesn't change. Could compacting GC cause embedded frozen strings to be moved?
        
           Updated by Eregon (Benoit Daloze) 4 months ago
          Updated by Eregon (Benoit Daloze) 4 months ago
          
          
        
        
      
      - Status changed from Open to Closed
Applied in changeset git|83fb07fb2c97b9922450979fa4a56f43324317a9.
[Bug #20998] Check if the string is frozen in rb_str_locktmp() & rb_str_unlocktmp()
        
           Updated by Eregon (Benoit Daloze) 4 months ago
          Updated by Eregon (Benoit Daloze) 4 months ago
          
          
        
        
      
      ioquatix (Samuel Williams) wrote in #note-16:
My usage of
rb_str_locktmpisn't about mutation, it's about making sure the pointer doesn't change. Could compacting GC cause embedded frozen strings to be moved?
rb_str_locktmp() doesn't prevent GC compaction to move the String, more details about that on the PR.
        
           Updated by Eregon (Benoit Daloze) 4 months ago
          Updated by Eregon (Benoit Daloze) 4 months ago
          
          
        
        
      
      - Target version set to 3.5
        
           Updated by hanazuki (Kasumi Hanazuki) 4 months ago
          Updated by hanazuki (Kasumi Hanazuki) 4 months ago
          
          
        
        
      
      FYI: IO::Buffer is now broken (by a recent change) because the embedded Strings are not pinned, and I'm proposing a fix at #21210
        
           Updated by Eregon (Benoit Daloze) 4 months ago
          
          · Edited
          Updated by Eregon (Benoit Daloze) 4 months ago
          
          · Edited
        
        
      
      @hanazuki Thanks for the link, I wish I found that earlier. It confirms that it was already broken which is the same conclusion I came to.
hanazuki (Kasumi Hanazuki) wrote in #note-20:
FYI: IO::Buffer is now broken (by a recent change)
Which change do you mean? git|6012145299cfa4ab561360c78710c7f2941a7e9d ?
EDIT: reply from hanazuki somehow not showing here: Yes. the commit "Mark rb_io_buffer_type references declaratively"
 Updated by Anonymous
          Updated by Anonymous