Feature #5749
opennew method String#match_all needed
Description
The String class should contain an instance method 'match_all', which is a mixture of 'match' and 'scan'.
The method 'scan' is not a very powerful tool, its result(the yielding thing) is just a matched string or an array of captured strings.
p 'a1bc2de3f'.scan(/(.)\d(.)/) # [["a", "b"], ["c", "d"], ["e", "f"]]
If the regex argument contains groups, I even cannot get the whole matched string, and no information about the matched offsets.
So, a 'match_all' is very necessary. It scan the string, finding every matched, and yielding MatchData instance to the following block.
Here's a simple implemention in Ruby:
class String
def match_all(re,i=0)
if block_given?
while m = self.match(re,i)
yield m
i = m.end(0)
end
return self
else
ary = []
while m = self.match(re,i)
ary << m
i = m.end(0)
end
return ary
end
end
end
However, it is not efficient in the 'while m = self.match(re,i)' way, because it scan the string again and again. If string is UTF8-encoded and contains out-of-ASCII characters, I'm afraid getting the start index of it is so expensive.
So, I think a built-in 'match_all' method, which behaves just like 'scan' but yield MatchData, is needed.
Please consider it, thank you!
Updated by naruse (Yui NARUSE) about 13 years ago
Why don't you use $~, $&, $`, $', $+, $1, $2, .. in scan' block parameter?
Updated by yimutang (Joey Zhou) about 13 years ago
Yui NARUSE wrote:
Why don't you use $~, $&, $`, $', $+, $1, $2, .. in scan' block parameter?
You reminds me! Yes, what I want can be done in this tricky way. Thank you!
However, I think relying on these special global variables is just an expedient.
If there is an explicit method, it's much more user-friendly and readable.
When I wanted the function, what I did is to consult the API, attempting to find a proper method, not thinking how to play with those magic punctuation. Maybe most people is just like me...
Updated by trans (Thomas Sawyer) about 13 years ago
If memory serves Facets has #mscan method.
Updated by tomoakin (Tomoaki Nishiyama) about 13 years ago
I proposed a similar one as each_match
http://bugs.ruby-lang.org/issues/5606
A difference is to have the next offset by
m.begin(0)+1
rather than m.end(0)
"AKASATANA".each_match(/A.A/)
will recognize AKA ASA ATA ANA
(This, I think, cannot be done with scan. Is it?)
Such different behavior might be controlled with an optional argument.
I think we might merge the discussion to this issue
rather than keeping too separate issues.
Anyway, I'm glad to hear a similar demand for a function to get the MatchData
objects, rather than scan() to set the trick.
Updated by mame (Yusuke Endoh) almost 13 years ago
- Status changed from Open to Assigned
- Assignee set to matz (Yukihiro Matsumoto)
Updated by shyouhei (Shyouhei Urabe) about 6 years ago
- Related to Feature #12745: String#(g)sub(!) should pass a MatchData to the block, not a String added