Bug #11014
closedString#partition doesn't return correct result on zero-width match
Description
First, to see how String#match
works on my example:
match = "foo".match(/^=*/)
match.pre_match #=> ""
match[0] #=> ""
match.post_match #=> "foo"
Now, if I used String#partition
instead of match
, I'd expect to get ["", "", "foo"]
(pre_match, match, post_match). However
"foo".partition(/^=*/) #=> ["foo", "", ""]
String#rpartition
returns the correct result (with the same regex).
Updated by nobu (Nobuyoshi Nakada) almost 10 years ago
- Description updated (diff)
- Status changed from Open to Assigned
- Assignee set to matz (Yukihiro Matsumoto)
These methods have been taken from Python, and seems same in Python.
I'm not sure what's the rationale of this behavior.
Updated by sawa (Tsuyoshi Sawada) about 5 years ago
The problem is not just for partition
, but also involves split
and scan
.
I think your regex /^=*/
is unnecessarily complex. Your point can be made by /\A/
, which is simpler.
I tried with four regex patterns /\A/
, /\A.*/
, /\z/
, /.*\z/
, and compared methods split
, partition
, scan
. The result of the first example in each group below matches the second and the third, and the fourth one matches the middle element. So far, so good.
"foo".match(/\z/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["foo", "", ""]
"foo".split(/(\z)/, -1) # => ["foo", "", ""]
"foo".partition(/\z/) # => ["foo", "", ""]
"foo".scan(/\z/) # => [""]
"foo".match(/\A.*/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "foo", ""]
"foo".split(/(\A.*)/, -1) # => ["", "foo", ""]
"foo".partition(/\A.*/) # => ["", "foo", ""]
"foo".scan(/\A.*/) # => ["foo"]
In the following, we see inconsistency:
"foo".match(/\A/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "", "foo"]
"foo".split(/(\A)/, -1) # => ["foo"]
"foo".partition(/\A/) # => ["foo", "", ""]
"foo".scan(/\A/) # => [""]
"foo".match(/.*\z/).then{[_1.pre_match, _1[0], _1.post_match]} # => ["", "foo", ""]
"foo".split(/(.*\z)/, -1) # => ["", "foo", ""]
"foo".partition(/.*\z/) # => ["", "foo", ""]
"foo".scan(/.*\z/) # => ["foo", ""]
The problematic cases and their expected values (in terms of consistency) are:
"foo".split(/(\A)/, -1) # => ["foo"], expected [ "", "", "foo"]
"foo".partition(/\A/) # => ["foo", "", ""], expected ["", "", "foo"]
"foo".scan(/.*\z/) # => ["foo", ""], expected ["foo"]
The case described in the issue is the second case above.
Updated by Dan0042 (Daniel DeLorme) about 5 years ago
IIRC this has to do with zero-length matches being ignored in certain conditions, in particular having to do with repeating/multiple matches.
if "foo".split(/\A/)
was ["","foo"]
then "foo".split(//)
would have to be ["","f","o","o"]
and "foo".split(/\G/)
could result in infinite loop matching ["","","","","",..."foo"]
But I don't understand why partition
doesn't behave like match
.
Ah, probably because it behaves like split(rx,2)
Note that gsub has different behavior:
"foo".gsub(/\G/,'_') #=> "_f_o_o_"
"foo".gsub(//,'_') #=> "_f_o_o_"
explained better than I ever could:
https://www.regular-expressions.info/zerolength.html
Updated by mame (Yusuke Endoh) about 5 years ago
We'd like to focus on String#partition in this ticket.
IMO, String#scan and #split are heavily used so they should not change just for consistency reason. Please create another ticket if you really need to discuss. And a patch suggestion is welcome.
Updated by akr (Akira Tanaka) about 5 years ago
nobu (Nobuyoshi Nakada) wrote:
These methods have been taken from Python, and seems same in Python.
I'm not sure what's the rationale of this behavior.
I couldn't confirm it.
% python3
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "abc".partition("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: empty separator
>>>
The empty separator causes an error in Python.
Updated by akr (Akira Tanaka) about 5 years ago
I feel the current behavior is just a bug and "abc".partition(//)
should return ["", "", "abc"]
instead ["abc", "", ""]
.
Updated by nobu (Nobuyoshi Nakada) about 5 years ago
- Status changed from Assigned to Closed
Applied in changeset git|fce54a5404139a77bd0b7d6f82901083fcb16f1e.
Fix String#partition
Split with the matched part when the separator matches the empty
part at the beginning. [Bug #11014]