Project

General

Profile

Actions

Feature #14417

closed

String#sub / String#gsub に『キーが Symbol の Hash』を渡せるようにする提案

Added by osyo (manga osyo) about 6 years ago. Updated about 6 years ago.

Status:
Feedback
Assignee:
-
Target version:
-
[ruby-dev:50445]

Description

概要

String#sub / String#gsub に『キーが SymbolHash』を渡した場合でも String の場合と同様に置き換える。

現行の動作

hash = {'b'=>'B', 'c'=>'C'}
p "abcabc".gsub(/[bc]/, hash)     #=> "aBCaBC"

# キー が Symbol の Hash は置き換えられない
hash = { b: 'B', c: 'C' }
p "abcabc".gsub(/[bc]/, hash)     #=> "aa"

# キー が Symbol の Hash は String に変換する必要がある
p "abcabc".gsub(/[bc]/, hash.transform_keys(&:to_s))     #=> "aBCaBC"

提案する動作

# キーが String の場合は現行維持
hash = {'b'=>'B', 'c'=>'C'}
p "abcabc".gsub(/[bc]/, hash)     #=> "aBCaBC"

hash = { b: 'B', c: 'C' }

# $& は動的であるべきなので String のまま
p "abcabc".gsub(/[bc]/){hash[$&]} #=> "aa"

# ブロックの引数は動的であるべきなので String のまま
p "abcabc".gsub(/[bc]/){ |s| hash[s] } #=> a"

# Hash を直接渡した場合のみキーが Symbol でも許容する
p "abcabc".gsub(/[bc]/, hash)     #=> "aBCaBC"

利点

課題

  • StringSymbol の両方のキーがあった場合どうするか
    • "abcabc".sub(/[bc]/, { "b" => "A", b: "C" }) # => ???
    • 現状は String を優先している
    • それ以前に StringSymbol が混ざっている Hash はおかしいのではないだろうか
    • 警告を出すとか?

String#gsub のユースケースなど

# http://batsov.com/articles/2013/10/03/using-rubys-gsub-with-a-hash/
def geekify(string)
  string.gsub(/[leto]/, l: '1', e: '3', t: '7', o: '0')
end

p geekify('leet') # => '1337'
p geekify('noob') # => 'n00b'


def doctorize(string)
  string.gsub(/M(iste)?r/, Mister: 'Doctor', Mr: 'Dr')
end

p doctorize('Mister Freeze') # => 'Doctor Freeze'
p doctorize('Mr Smith')   # => 'Dr Smith'
# https://coderwall.com/p/t4y7cw/ruby-gsub-with-a-hash-or-block
amino_acid_hash = { A: 'Ala', R: 'Arg', N: 'Asn' }

p "R232A".gsub(/[A-Z]/, amino_acid_hash)
# => "Arg232Ala"
# https://qiita.com/scivola/items/416155c307ec29a37b8f
hash = {
  '&': "&amp",
  '<': "&lt",
  '>': "&gt",
}

p "<Q&A>".gsub(/[&<>]/, hash)
# => "&ltQ&ampA&gt"
# https://qiita.com/pocari/items/34855a9b07ea5006fe80
hash = {
  '#to#': "taro",
  '#from#': "jiro",
}

template = <<EOS
hello, #to#.
message from #from#.
EOS

puts template.gsub(/#.*#/, hash)
# => hello, taro.
# message from jiro.

その他、具体的なユースケースを思いついた方がいればコメントいただけると助かります。


Files

string_sub_with_symbol_key.patch (2.71 KB) string_sub_with_symbol_key.patch osyo (manga osyo), 01/29/2018 09:34 AM

Updated by Hanmac (Hans Mackowiak) about 6 years ago

even if Ruby Symbols are freed now, i still have some problems with that it creates that much symbols from possible tainted string data

would it probably better if sub/gsub would call hash.transform_keys(&:to_s) internal in their code with the hash if hash is given?

if yes then this would work too:

"12345".gsub(/\d/,{"1" => "A", "2" => "B", "3" => "C", "4" => "D", "5" => "E"}) #=> "ABCDE"
"12345".gsub(/\d/,{1 => "A", 2 => "B", 3 => "C", 4 => "D", 5 => "E"}) #=> "ABCDE"

Updated by shyouhei (Shyouhei Urabe) about 6 years ago

  • Status changed from Open to Feedback

提案されている利点は弱すぎて賛成しがたいです(趣味では)。
とはいえ機能自体に反対ではないですから、より具体的なユースケースがあると賛成しやすくなるかなと思います。

Updated by osyo (manga osyo) about 6 years ago

Thanks for reply!!!

would it probably better if sub/gsub would call hash.transform_keys(&:to_s) internal in their code with the hash if hash is given?

hmmmm... Thanks for idea :)

とはいえ機能自体に反対ではないですから、より具体的なユースケースがあると賛成しやすくなるかなと思います。

そうですねえ…もう少し具体的なユースケースを考えてみたいと思います。
コメントありがとうございます。

Updated by Hanmac (Hans Mackowiak) about 6 years ago

did look at string.c for gsub code,
https://github.com/ruby/ruby/blob/trunk/string.c#L5094
seems to be the line where we could add a transform_keys call

but i don't know currently what the best way to call hash.transform_keys(&:to_s)
probably something with rb_funcall_with_block?

Updated by duerst (Martin Dürst) about 6 years ago

gsub with Hash is used in some contexts where high performance is of interest. An example is lib/unicode_normalize/normalize.rb. This proposal would make these cases less efficient, for the benefit of people who can't keep Symbols and Strings apart.

As discussed in another issue, b: 'B', c: 'C' is not a shortcut for 'b'=>'B', 'c'=>'C'. We already have methods to change Hash keys (or values), and we probably need more of them, but I think we don't need more methods that accepts strings and symbols indeterminately.

Updated by Hanmac (Hans Mackowiak) about 6 years ago

@duerst (Martin Dürst): what about my example where it does transform the keye internal for the given Hash?
or is that a nono too?
it might be possible to only do it if the given hash has non String key?

Updated by naruse (Yui NARUSE) about 6 years ago

Hanmac (Hans Mackowiak) wrote:

@duerst (Martin Dürst): what about my example where it does transform the keye internal for the given Hash?
or is that a nono too?
it might be possible to only do it if the given hash has non String key?

If the hash is called many times from gsub, those integers shold be converted as String before gsub.
Because such conversion needs object allocation many times, and cause many GC.

I think

h = {1 => "A", 2 => "B", 3 => "C", 4 => "D", 5 => "E"}
"12345".gsub(/\d/){ h[$&.to_i] }

is faster than such code.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0