Project

General

Profile

Feature #13381

[PATCH] Expose rb_fstring and its family to C extensions

Added by eagletmt (Kohei Suzuki) over 3 years ago. Updated 8 days ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
[ruby-core:80447]


Related issues

Related to Ruby master - Feature #17147: New method to get frozen strings from String objectsFeedbackActions
Has duplicate Ruby master - Feature #16029: Expose fstring related APIs to C-extensionsClosedActions
#1

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Status changed from Open to Feedback

I can understand use cases but we shouldn't expose the name "fstring".

Updated by eagletmt (Kohei Suzuki) over 3 years ago

OK, I've read comments of #13077.

What do you think of renaming fstring to "deduped" string? "Deduped" strings are implicitly frozen.

  • Rename rb_fstring to rb_str_deduped
  • Rename rb_fstring_new to rb_str_deduped_new
  • Rename rb_fstring_cstr to rb_str_deduped_cstr
  • Rename rb_fstring_enc_new to rb_enc_str_deduped_new
  • Rename rb_fstring_enc_cstr to rb_enc_str_deduped_cstr
    • I think enc should come first for consistency

Updated by ko1 (Koichi Sasada) over 3 years ago

  • Assignee set to ko1 (Koichi Sasada)
  • Status changed from Feedback to Assigned

Updated by nobu (Nobuyoshi Nakada) over 3 years ago

  • Assignee deleted (ko1 (Koichi Sasada))

How about fixed_str?

#5

Updated by k0kubun (Takashi Kokubun) 11 months ago

  • Has duplicate Feature #16029: Expose fstring related APIs to C-extensions added

Updated by ko1 (Koichi Sasada) 5 months ago

Nobu and I discussed the name, we want to choose rb_str_pool terminology.

  • rb_str_pool prefix

    • rb_str_pool(VALUE)
    • rb_str_pool_buff(const char *ptr, long len)
    • rb_str_pool_cstr(const char *ptr)
    • rb_enc_str_pool_buff(const char *ptr, long len, rb_encoding *enc)
    • rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)
  • FYI: rb_str_new series:

    • rb_str_new(const char *ptr, long len)
    • rb_str_new_cstr(const char *ptr)
    • rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
    • rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc)

If there is no comments, we will merge it soon.

Updated by Eregon (Benoit Daloze) 5 months ago

I'd suggest buff => buffer, I don't think it's worth abbreviating that.
BTW, there is rb_alloc_tmp_buffer so buff would be inconsistent.

Is rb_str_pool(VALUE) useful?
I think it could be rb_str_uminus() (or just rb_funcall(str, rb_intern("-@"))).
And then we could be consistent with rb_str_new*, and drop confusing _buff suffixes (not really buffer (=> sounds mutable), just a C string with known length).

So I think this is better:

rb_str_uminus(VALUE) // or just rb_funcall(str, rb_intern("-@"))
rb_str_pool(const char *ptr, long len)
rb_str_pool_cstr(const char *ptr)
rb_enc_str_pool(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)

Updated by nobu (Nobuyoshi Nakada) 5 months ago

ko1 (Koichi Sasada) suggests rb_str_pool_value(VALUE) for the first type.

As these may not create a new object, I thought of using pool as a verb instead of new however, it doesn't feel fit much.
Does anyone have other ideas?

Updated by byroot (Jean Boussier) 5 months ago

Does anyone have other ideas?

A common term for this in intern / interning / interned. However that terminology is already used for symbols (rb_str_intern).

This would look like:

rb_interned_str(VALUE)
rb_interned_str(const char *ptr, long len)
rb_interned_str_cstr(const char *ptr)
rb_enc_interned_str(const char *ptr, long len, rb_encoding *enc)
rb_enc_interned_str_cstr(const char *ptr, rb_encoding *enc)

Updated by ko1 (Koichi Sasada) 4 months ago

However that terminology is already used for symbols (rb_str_intern).

Yes. This is why we didn't propose it.

#11

Updated by naruse (Yui NARUSE) 3 months ago

  • Related to Feature #17147: New method to get frozen strings from String objects added

Updated by ko1 (Koichi Sasada) 2 months ago

I'm now positive "interned_str" instead of "interned".

Updated by byroot (Jean Boussier) 2 months ago

Would you like a patch?

Updated by Dan0042 (Daniel DeLorme) 2 months ago

Any of the earlier suggestions were good, such as "pool" or "deduped". But while "interned" is technically correct, it will lead to confusion with symbols. It's better to keep a separate terminology imho.

#15

Updated by byroot (Jean Boussier) 17 days ago

  • Status changed from Assigned to Closed

Applied in changeset git|ef19fb111a8c8bf1a71d46e6fcf34b227e086845.


Expose the rb_interned_str_* family of functions

Fixes [Feature #13381]

Updated by byroot (Jean Boussier) 16 days ago

After trying to use the new functions in json and messagepack I realized we overlooked something entirely.

The internal rb_fstring_* will build new strings with str_new_static, so will directly re-use the string pointer that was provided. That's pretty much unusable, and unsafe.

I submitted another pull request to fix this: https://github.com/ruby/ruby/pull/3786

Updated by nobu (Nobuyoshi Nakada) 8 days ago

It sounds like that fstring doesn't match that purpose.
Is it really better to divert fstring than separated string pools?

Updated by byroot (Jean Boussier) 8 days ago

It sounds like that fstring doesn't match that purpose.

I'm not sure why it wouldn't. Ultimately the prupose is the same than String#-@, but from the C API and by passing a char *.

Is it really better to divert fstring than separated string pools?

It would be way less efficient, consider the following case:

objects = JSON.load_file('path/to.json') # [{"field": 1}, {"field": 2}, ...]
objects.map { |o| o['field'] }

Here some_field since it is a literal is part of the fstring table. As of Ruby 2.7 the json extension has to rb_str_new() many times, before calling hash_aset which will then deduplicate these strings.

In some use cases like ours, this generates a huge amount of GC pressure that could be avoided if the json extension (and some others) could directly lookup interned strings from a char pointer.

Updated by byroot (Jean Boussier) 8 days ago

alanwu (Alan Wu) just pointed to me that the confusion might come from the fact that this ticket need isn't quite the same than the one I originally opened: https://bugs.ruby-lang.org/issues/16029

They are close but not quite duplicates. eagletmt (Kohei Suzuki) wanted a fast API to convert static C strings into Ruby strings, I wanted an API for basically rb_str_new_frozen() that would deduplicate at the same time without having to allocate.

Updated by byroot (Jean Boussier) 8 days ago

Also if that helps, here's how it would be used by json: https://github.com/flori/json/pull/451

Also available in: Atom PDF