Feature #13890
open
Allow a regexp as an argument to 'count', to count more interesting things than single characters
Added by duerst (Martin Dürst) about 7 years ago.
Updated almost 2 years ago.
Description
Currently, String#count only accepts strings, and counts all the characters in the string.
However, I have repeatedly met the situation where I wanted to count more interesting things in strings.
These 'interesting things' can easily be expressed with regular expressions.
Here is a quick-and-dirty Ruby-level implementation:
class String
alias old_count count
def count (what)
case what
when String
old_count what
when Regexp
pos = -1
count = 0
count += 1 while pos = index(what, pos+1)
count
end
end
end
Please note that the implementation counts overlapping occurrences; maybe there is room for an option like overlap: :no
.
Should it behave the same as str.scan(regexp).size ?
I think the default should be no overlap, and increment the position by the length of the match.
Eregon (Benoit Daloze) wrote:
I think the default should be no overlap, and increment the position by the length of the match.
That would be fine by me, too.
Python allows to count strings, as follows:
str.count(sub[, start[, end]])
Return the number of non-overlapping occurrences of substring
sub
in the range [start, end]
. Optional arguments start
and end
are interpreted as in slice notation.
- Related to Feature #12698: Method to delete a substring by regex match added
I'd love to have this feature. A str.count(regexp)
is something I see folk trying fairly often. A str.count(regexp)
also avoids the intermediary Array of str.scan(regexp).size
or the back bending with str.enum_for(:scan, regexp).count
.
If str.count(re)
works as str.scan(re).size
(besides efficiency), it's acceptable. But if someone needs overlapping, they needs to explain their use-case.
Matz.
Overlapping can be realized by putting the original regexp within a look-ahead.
s = "abcdefghij"
re = /.{3}/
Non-overlapping count:
s.scan(re).count # => 3
s.count(re) # => Expect 3
Overlapping count:
s.scan(/(?=#{re})/).count # => 8
s.count(/(?=#{re})/) # => Expect 8
So I do not think there is any need to particularly implement overlapping as a feature of this method.
Also available in: Atom
PDF
Like0
Like0Like0Like0Like1Like0Like0Like0