Feature #13381
closed[PATCH] Expose rb_fstring and its family to C extensions
Added by eagletmt (Kohei Suzuki) over 8 years ago. Updated almost 5 years ago.
Description
https://github.com/ruby/ruby/pull/1559
Currently, C extensions cannot use fstrings. I'd like to use
rb_fstring_cstr
instead of rb_str_new_cstr
for static strings in C
extensions to avoid excess allocation.
I think there's several use cases.
- https://github.com/k0kubun/hamlit/blob/v2.8.0/ext/hamlit/hamlit.c#L508-L512
- https://bitbucket.org/ged/ruby-pg/src/e5eb92cca97abc0c6fc168acfad993c2ad314589/ext/pg_connection.c?at=v0.20.0&fileviewer=file-view-default#pg_connection.c-3679
- https://bitbucket.org/ged/ruby-pg/src/e5eb92cca97abc0c6fc168acfad993c2ad314589/ext/pg_copy_coder.c?at=v0.20.0&fileviewer=file-view-default#pg_copy_coder.c-38
Updated by ko1 (Koichi Sasada) over 8 years ago
Actions
#1
- Status changed from Open to Feedback
I can understand use cases but we shouldn't expose the name "fstring".
Updated by eagletmt (Kohei Suzuki) over 8 years ago
Actions
#2
[ruby-core:80451]
OK, I've read comments of #13077.
What do you think of renaming fstring to "deduped" string? "Deduped" strings are implicitly frozen.
- Rename
rb_fstring
torb_str_deduped
- Rename
rb_fstring_new
torb_str_deduped_new
- Rename
rb_fstring_cstr
torb_str_deduped_cstr
- Rename
rb_fstring_enc_new
torb_enc_str_deduped_new
- Rename
rb_fstring_enc_cstr
torb_enc_str_deduped_cstr
- I think
enc
should come first for consistency
- I think
Updated by ko1 (Koichi Sasada) over 8 years ago
Actions
#3
[ruby-core:81703]
- Status changed from Feedback to Assigned
- Assignee set to ko1 (Koichi Sasada)
Updated by nobu (Nobuyoshi Nakada) over 8 years ago
Actions
#4
[ruby-core:81704]
- Assignee deleted (
ko1 (Koichi Sasada))
How about fixed_str
?
Updated by k0kubun (Takashi Kokubun) almost 6 years ago
Actions
#5
- Has duplicate Feature #16029: Expose fstring related APIs to C-extensions added
Updated by ko1 (Koichi Sasada) over 5 years ago
Actions
#6
[ruby-core:99199]
Nobu and I discussed the name, we want to choose rb_str_pool
terminology.
-
rb_str_pool
prefixrb_str_pool(VALUE)
rb_str_pool_buff(const char *ptr, long len)
rb_str_pool_cstr(const char *ptr)
rb_enc_str_pool_buff(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)
-
FYI:
rb_str_new
series:rb_str_new(const char *ptr, long len)
rb_str_new_cstr(const char *ptr)
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_new_cstr(const char *ptr, rb_encoding *enc)
If there is no comments, we will merge it soon.
Updated by Eregon (Benoit Daloze) over 5 years ago
Actions
#7
[ruby-core:99200]
I'd suggest buff
=> buffer
, I don't think it's worth abbreviating that.
BTW, there is rb_alloc_tmp_buffer
so buff
would be inconsistent.
Is rb_str_pool(VALUE)
useful?
I think it could be rb_str_uminus()
(or just rb_funcall(str, rb_intern("-@"))
).
And then we could be consistent with rb_str_new*
, and drop confusing _buff
suffixes (not really buffer (=> sounds mutable), just a C string with known length).
So I think this is better:
rb_str_uminus(VALUE) // or just rb_funcall(str, rb_intern("-@"))
rb_str_pool(const char *ptr, long len)
rb_str_pool_cstr(const char *ptr)
rb_enc_str_pool(const char *ptr, long len, rb_encoding *enc)
rb_enc_str_pool_cstr(const char *ptr, rb_encoding *enc)
Updated by nobu (Nobuyoshi Nakada) over 5 years ago
Actions
#8
[ruby-core:99236]
@ko1 (Koichi Sasada) suggests rb_str_pool_value(VALUE)
for the first type.
As these may not create a new object, I thought of using pool
as a verb instead of new
however, it doesn't feel fit much.
Does anyone have other ideas?
Updated by byroot (Jean Boussier) over 5 years ago
Actions
#9
[ruby-core:99237]
Does anyone have other ideas?
A common term for this in intern
/ interning
/ interned
. However that terminology is already used for symbols (rb_str_intern
).
This would look like:
rb_interned_str(VALUE)
rb_interned_str(const char *ptr, long len)
rb_interned_str_cstr(const char *ptr)
rb_enc_interned_str(const char *ptr, long len, rb_encoding *enc)
rb_enc_interned_str_cstr(const char *ptr, rb_encoding *enc)
Updated by ko1 (Koichi Sasada) about 5 years ago
Actions
#10
[ruby-core:99564]
However that terminology is already used for symbols (rb_str_intern).
Yes. This is why we didn't propose it.
Updated by naruse (Yui NARUSE) about 5 years ago
Actions
#11
- Related to Feature #17147: New method to get frozen strings from String objects added
Updated by ko1 (Koichi Sasada) about 5 years ago
Actions
#12
[ruby-core:100083]
I'm now positive "interned_str" instead of "interned".
Updated by byroot (Jean Boussier) about 5 years ago
Actions
#13
[ruby-core:100084]
Would you like a patch?
Updated by Dan0042 (Daniel DeLorme) about 5 years ago
Actions
#14
[ruby-core:100114]
Any of the earlier suggestions were good, such as "pool" or "deduped". But while "interned" is technically correct, it will lead to confusion with symbols. It's better to keep a separate terminology imho.
Updated by byroot (Jean Boussier) almost 5 years ago
Actions
#15
- Status changed from Assigned to Closed
Applied in changeset git|ef19fb111a8c8bf1a71d46e6fcf34b227e086845.
Expose the rb_interned_str_* family of functions
Fixes [Feature #13381]
Updated by byroot (Jean Boussier) almost 5 years ago
Actions
#16
[ruby-core:100935]
After trying to use the new functions in json
and messagepack
I realized we overlooked something entirely.
The internal rb_fstring_*
will build new strings with str_new_static
, so will directly re-use the string pointer that was provided. That's pretty much unusable, and unsafe.
I submitted another pull request to fix this: https://github.com/ruby/ruby/pull/3786
Updated by nobu (Nobuyoshi Nakada) almost 5 years ago
Actions
#17
[ruby-core:101082]
It sounds like that fstring
doesn't match that purpose.
Is it really better to divert fstring
than separated string pools?
Updated by byroot (Jean Boussier) almost 5 years ago
Actions
#18
[ruby-core:101102]
It sounds like that fstring doesn't match that purpose.
I'm not sure why it wouldn't. Ultimately the prupose is the same than String#-@
, but from the C API and by passing a char *
.
Is it really better to divert fstring than separated string pools?
It would be way less efficient, consider the following case:
objects = JSON.load_file('path/to.json') # [{"field": 1}, {"field": 2}, ...]
objects.map { |o| o['field'] }
Here some_field
since it is a literal is part of the fstring table. As of Ruby 2.7 the json
extension has to rb_str_new()
many times, before calling hash_aset
which will then deduplicate these strings.
In some use cases like ours, this generates a huge amount of GC pressure that could be avoided if the json
extension (and some others) could directly lookup interned strings from a char
pointer.
Updated by byroot (Jean Boussier) almost 5 years ago
Actions
#19
[ruby-core:101103]
@alanwu (Alan Wu) just pointed to me that the confusion might come from the fact that this ticket need isn't quite the same than the one I originally opened: https://bugs.ruby-lang.org/issues/16029
They are close but not quite duplicates. @eagletmt wanted a fast API to convert static C strings into Ruby strings, I wanted an API for basically rb_str_new_frozen()
that would deduplicate at the same time without having to allocate.
Updated by byroot (Jean Boussier) almost 5 years ago
Actions
#20
[ruby-core:101104]
Also if that helps, here's how it would be used by json
: https://github.com/flori/json/pull/451