Feature #21932
closed`MatchData#get_int`
Description
This is suggested by @akr (Akira Tanaka) today, $~.get_int(1) is equivalent to $1.to_i but does not create the intermediate string object.
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
- Description updated (diff)
Updated by zenspider (Ryan Davis) about 1 month ago
Tried to add a comment to your commit but github is being very sketchy today.
In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?
Updated by nobu (Nobuyoshi Nakada) about 1 month ago
zenspider (Ryan Davis) wrote in #note-2:
In the method comment on the impl side, you have examples for parsing a date... but IDGI... 1/2/10 are supposed to be the base arg, right? Base 1?
I can't get from where the example comes.
Do you want to mean something like this?
/\d+/.match("1/2/10").get_int(0) # => 1
/\d+/.match("1/2/10").get_int(0, 1) # invalid radix 1 (ArgumentError)
Updated by kou (Kouhei Sutou) 29 days ago
FYI: strscan will use integer_at not get_int: https://github.com/ruby/strscan/pull/192#issuecomment-4002582149
Updated by Eregon (Benoit Daloze) 29 days ago
- Related to Feature #21943: Add StringScanner#get_int to extract capture group as Integer without intermediate String added
Updated by matz (Yukihiro Matsumoto) 18 days ago
I agree with adding integer_at(n) to MatchData, and StringScanner too (#21943).
Matz.
Updated by mame (Yusuke Endoh) 17 days ago
Here is a supplement to Matz's decision.
This method will basically follow the behavior of String#to_i.
The base can be specified as the second argument:
"2024" =~ /(\d+)/
$~.integer_at(1) # => 2024 (default: base 10)
$~.integer_at(1, 8) # => 1044 (interprets "2024" as base 8)
$~.integer_at(1, 16) # => 8228 (interprets "2024" as base 16)
When it encounters non-numeric characters or an empty string, it behaves the same as String#to_i:
# integer_at should behave as String#to_i
"foo" =~ /(...)/
$~.integer_at(1) # => 0 (== "foo".to_i)
"0xF" =~ /(...)/
$~.integer_at(1) # => 0 (== "0xF".to_i, not 15)
"" =~ /(\d*)/
$~.integer_at(1) # => 0 (== "".to_i)
"1_0_0" =~ /(\d+(?:_\d+)*)/
$~.integer_at(1) # => 100 (== "1_0_0".to_i)
If the base is set to 0, it respects prefixes like 0x (the same as String#to_i(0)):
"0xF" =~ /(...)/
$~.integer_at(1, 0) # => 15 (== "0xF".to_i(0))
If there is no match for the group, it returns nil:
"b" =~ /(a)|(b)/
$~.integer_at(1) # => nil
Updated by Eregon (Benoit Daloze) 17 days ago
I think returning 0 when the group isn't parseable as a number seems bad behavior.
At least if I would use this method, I would expect two things of it:
- It returns the Integer value of that group, without needing
Integer($N) - It fails if the capture isn't a number, like Kernel#Integer
Does anyone have a use case for returning 0 when the group isn't a number?
It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.
Updated by naruse (Yui NARUSE) 17 days ago
Eregon (Benoit Daloze) wrote in #note-8:
I think returning 0 when the group isn't parseable as a number seems bad behavior.
At least if I would use this method, I would expect two things of it:
- It returns the Integer value of that group, without needing
Integer($N)- It fails if the capture isn't a number, like Kernel#Integer
Does anyone have a use case for returning 0 when the group isn't a number?
It just seems like a "broken data" situation for no reason when e.g. using the wrong group number.
There is two reason:
- there are two major method to parse integer in Ruby: to_i and Integer().
- to_i is loose and the default base is 10
- Integer is strict, and the default base is
0; it interprets "0o" and "0x" prefix
In this use case, interpreting "0x" prefix is not useful. If this behavior is to_i, it is easy to explain the behavior.
In other words,match_data.get_int(n)behaves asmatch_data[n]&.to_i
- Distinguish with the group is not matched
Considering/(a)|(\d+)/ =~ "a"; $~.get_int(2).
The current proposal says it returns nil. Another option for this case is exception, but I think it is not useful.
At this time I can distinguish the case with matching "0", because this returns 0.
Other minor reasons are...
- for empty string, it will returns 0.
- if you want to reject non integers, you can write strict regexp pattern.
Updated by Eregon (Benoit Daloze) 17 days ago
ยท Edited
Thanks for the explanations.
naruse (Yui NARUSE) wrote in #note-9:
In this use case, interpreting "0x" prefix is not useful
It could be useful, but one could workaround that with /0x(\h+)/ instead of /(0x\h+)/.
Leading 0 (octal) is likely more dangerous than 0x though (Integer("011") => 9).
If this behavior is to_i, it is easy to explain the behavior.
It wouldn't be hard to explain it's the same as Integer($N, 10).
Distinguish with the group is not matched
Yes, agreed returning nil for group not matched is good.
for empty string, it will returns 0.
Could easily be handled as a special case but yeah not as simple as Integer($N, 10) then.
Still fairly easy to explain/document.
if you want to reject non integers, you can write strict regexp pattern.
This reason convinces me, it's not bulletproof but should be enough guarantee for most cases to not return 0 except for actual 0's in input (or empty string).
BTW, given the method name is MatchData#integer_at(n), people might expect it uses Integer() as that's very similar to the method name.
Updated by nobu (Nobuyoshi Nakada) 11 days ago
- Status changed from Open to Closed
Applied in changeset git|72eb59d0b23522508300896bbbe73716fe82349e.
[Feature #21932] Add MatchData#get_int