Feature #17184
openNo stdlib function to perform simple string replacement
Added by sheerun (Adam Stankiewicz) about 4 years ago. Updated 8 months ago.
Description
I have following simple build.rb
:
template = File.read('template.vim')
script = File.read('script.vim')
File.write('app.vim', template.gsub("SCRIPT", script))
And then following template.vim
:
" some header
SCRIPT
Plus following script.vim
:
if g:something =~ "\s\+"
echo 'g:something is empty'
endif
I'd expect that the script above produces app.vim
with following contents:
" some header
if g:something =~ "\s\+"
echo 'g:something is empty'
endif
Unfortunately it produces following:
" some header
if g:something =~ "\s"
echo 'g:something is empty'
endif
It's probably because gsub interprets \+
in script as back-reference.
I tried to find replacement function in ruby that just replaces one string with something else, without interpreting replacement in any way, but surprisingly I haven't found any.. Am I mistaken?
Updated by briankung (Brian Kung) about 4 years ago
The hash replacement form of gsub doesn't have this problem, strangely enough:
'mislocated cat, vindicating'.gsub('cat', 'dog\+')
#=> "mislodoged dog, vindidoging"
'mislocated cat, vindicating'.gsub('cat', 'cat' => 'dog\+')
#=> "mislodog\\+ed dog\\+, vindidog\\+ing"
Updated by byroot (Jean Boussier) about 4 years ago
The intended API for that is String[]=
, e.g.
string["search"] = "replace"
Updated by sheerun (Adam Stankiewicz) about 4 years ago
byroot (Jean Boussier) wrote in #note-2:
The intended API for that is
String[]=
, e.g.
Is there a non-mutable version of this?
Updated by sheerun (Adam Stankiewicz) about 4 years ago
For now I'm using self-implemented function, but I wish there was something built-in:
class String
# Simple substitution, ignores backreferences in sub
# https://bugs.ruby-lang.org/issues/17184
def ssub(str, sub)
new_str = String.new(self, capacity: self.size - str.size + sub.size)
new_str[str] = sub
new_str
end
end
Updated by sheerun (Adam Stankiewicz) about 4 years ago
Little more advanced, but common usecase, which also doesn't have built-in function is replacing all strings in simple way (i.e. gsub replacement, not sub replacement).
Updated by Eregon (Benoit Daloze) about 4 years ago
@chrisseaton (Chris Seaton) noticed the same issue a while ago.
String#[]=
only replaces the first occurrence (and mutates the receiver).
I think we should remove special treatment of \+
, etc in the replacement string for sub/gsub(String, String)
.
There is no Regexp involved, so I think there is no reason to have those Regexp backreferences.
sub/gsub(String, String)
only has a single "group", which is 0 and exactly the same as the replacement String, so those backreferences seem useless.
Here is the list of those special \
sequences:
https://github.com/oracle/truffleruby/blob/6bb23e236f83416fc4a000b6f3cf976947117ed7/src/main/ruby/truffleruby/core/truffle/string_operations.rb#L187-L246
Potentially incompatible, but who would use those sequences for sub/gsub(String, String)
?
I think it's always unintentional, and a source of subtle bugs.
And worse than that, I can imagine it could become a security issue if the pattern String is fixed and the replacement String comes from user input.
Then the user would expect the pattern String cannot be seen in the result, but it can if the replacement String is \&
.
@sheerun (Adam Stankiewicz) I'd suggest to not bother with a capacity computation, that's very unidiomatic. You can use new_str = str.dup
instead.
Updated by sheerun (Adam Stankiewicz) about 4 years ago
String doesn't have dup method
Updated by znz (Kazuhiro NISHIYAMA) about 4 years ago
You can use gsub("SCRIPT") { script }
.
If you do not want the special treatment of replacement, you should use gsub
with block instead of gsub(pat, replacement)
.
Updated by sheerun (Adam Stankiewicz) about 4 years ago
Nice, it actually suits me.
I'll leave this issue open though, because I agree with what Benoit said above
Updated by Eregon (Benoit Daloze) about 4 years ago
sheerun (Adam Stankiewicz) wrote in #note-7:
String doesn't have dup method
What do you mean? "foo".dup # => "foo"
.
Updated by Eregon (Benoit Daloze) about 4 years ago
znz (Kazuhiro NISHIYAMA) wrote in #note-8:
You can use
gsub("SCRIPT") { "replacement" }
.
I forgot about this form for this purpose. I find it really not intuitive for replacing strings.
A block is typically useful for more custom/complicated logic, but here ironically it has more simple semantics than the replacement String argument.
Updated by Dan0042 (Daniel DeLorme) about 4 years ago
I think we should remove special treatment of
\+
, etc in the replacement string forsub/gsub(String, String)
.
There is no Regexp involved, so I think there is no reason to have those Regexp backreferences.
I agree that would be the sensible behavior, but I'm not sure the incompatibility is worth it. Imagine you try "foo".gsub("o", '\+')
expecting to get f\+\+
as a result. Instead you get just f
, and after a lot of searching and debugging you end up with "foo".gsub("o", '\\\\+')
which produces the correct result. If we were to change the behavior, the result will change to f\\+\\+
. I'm sure there's a lot of ruby code out there that use a replacement string with a ton of backslashes to avoid this issue. Breaking all that code is not a good idea imho.
Updated by byroot (Jean Boussier) about 4 years ago
IMHO, rather than change the behavior of gsub
, introducing a much simpler String#replace(search, replace)
would make more sense, and would be more discoverable.
Updated by Eregon (Benoit Daloze) about 4 years ago
- Assignee set to matz (Yukihiro Matsumoto)
Dan0042 (Daniel DeLorme) wrote in #note-12:
Instead you get just
f
, and after a lot of searching and debugging you end up with"foo".gsub("o", '\\\\+')
which produces the correct result. If we were to change the behavior.
Are there examples of such code?
It seems hard to read and a worse version of "foo".gsub("o") { '\+' }
which is more general.
I would guess it's so rare that this is definitely worth the tiny corner-case incompatibility.
And I would think it would reveal more unintended strange replacement cases than actually breaking such manually-escaped constant string cases.
byroot (Jean Boussier) wrote in #note-13:
IMHO, rather than change the behavior of
gsub
, introducing a much simplerString#replace(search, replace)
would make more sense, and would be more discoverable.
But then we need to clarify if that replaces the first or all occurrences (and String#[]= is hardly intuitive or related by name).
sub
/gsub
already have optimizations for (String, String) cases (notably, they don't use a Regexp internally).
It seems sad to not simply reuse them when the intended semantics are very clear.
It seems a huge pitfall to me that sub/gsub(String, String)
will not use the replacement String as-is. A new method does not address that.
@matz (Yukihiro Matsumoto) Can we experiment with that change? I expect extremely few incompatibilities.
I expect it will fix many usages which do not expect any magic backreferences replacement.
Does anyone know why this behavior exists in the first place?
I guess maybe because sub/gsub(String, String) were once implemented by making a Regexp for the pattern and that scanned the replacement String for backreferences, causing this strange behavior?
Updated by byroot (Jean Boussier) about 4 years ago
But then we need to clarify if that replaces the first or all occurrences
As far as I know, String replace
commonly means replace all occurrences.
- Python: replaces all occurrences, but has a
maxreplace
optional parameters. https://docs.python.org/2/library/string.html#string.replace - PHP: replaces all occurrences https://www.php.net/manual/en/function.str-replace.php
- Javascript, is a bit weird, it's all if the search is a regexp with the
/g
modifier, for strings, it's the first occurence, and it hasreplaceAll
alongside it. - Java: replaces all occurrences https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(char,%20char)
- C#: replaces all occurrences: https://docs.microsoft.com/en-us/dotnet/api/system.string.replace?view=netcore-3.1
- Go has replace and replaceAll like Javascript: https://golang.org/pkg/strings/#Replace
- Rust replaces all occurrences: https://doc.rust-lang.org/std/string/struct.String.html#method.replace
So all in all I don't think it's a huge problem, the least surprising IMHO would be replace all occurrence, and I find Python's optional maxreplace
quite a good idea.
Updated by Eregon (Benoit Daloze) about 4 years ago
There is already String#replace (consistent with {Array,Hash,Set,...}#replace), so I don't think it's a good fit:
https://docs.ruby-lang.org/en/master/String.html#method-i-replace
Updated by byroot (Jean Boussier) about 4 years ago
There is already String#replace
Oh wow, I somehow missed that...
Updated by jeremyevans0 (Jeremy Evans) almost 4 years ago
- Tracker changed from Bug to Feature
- ruby -v deleted (
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-darwin19]) - Backport deleted (
2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN)
Updated by hsbt (Hiroshi SHIBATA) 8 months ago
- Status changed from Open to Assigned