Bug #19379
closedRegex: "end pattern with unmatched parenthesis" with Ruby 3.2 and interpolation
Description
Sample code:
r2 = %r{#c-\w+/comment/[\w-]+}
%r{https?://[^/]+#{r2}}x
This works with Ruby 3.1:
irb(main):001:0> r2 = %r{#c-\w+/comment/[\w-]+}
irb(main):002:0> %r{https?://[^/]+#{r2}}x
=> /https?:\/\/[^\/]+(?-mix:#c-\w+\/comment\/[\w-]+)/x
But fails with Ruby 3.2.0:
irb(main):022:0> r2 = %r{#c-\w+/comment/[\w-]+}
irb(main):023:0> %r{https?://[^/]+#{r2}}x
(irb):23:in `<main>': end pattern with unmatched parenthesis: /https?:\/\/[^\/]+(?-mix:#c-\w+\/comment\/[\w-]+)/x (RegexpError)
But if I dont use interpolation, it works correctly:
irb(main):001:0> %r{https?://[^/]+#c-\w+/comment/[\w-]+}x
=> /https?:\/\/[^\/]+#c-\w+\/comment\/[\w-]+/x
Updated by znz (Kazuhiro NISHIYAMA) almost 2 years ago
% docker run --platform linux/amd64 --rm -it ghcr.io/ruby/all-ruby env ALL_RUBY_SINCE=ruby-3.0 ./all-ruby -e 'r=/#/;p /#{r}/x'
ruby-3.0.0 /(?-mix:#)/x
...
ruby-3.2.0-preview1 /(?-mix:#)/x
ruby-3.2.0-preview2 -e:1:in `<main>': end pattern with unmatched parenthesis: /(?-mix:#)/x (RegexpError)
exit 1
...
ruby-3.2.0 -e:1:in `<main>': end pattern with unmatched parenthesis: /(?-mix:#)/x (RegexpError)
exit 1
Updated by znz (Kazuhiro NISHIYAMA) almost 2 years ago
I think minimal case is /(?-x:#)/x
.
Updated by znz (Kazuhiro NISHIYAMA) almost 2 years ago
- Assignee set to make_now_just (Hiroya Fujinami)
Updated by znz (Kazuhiro NISHIYAMA) almost 2 years ago
- Backport changed from 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED
Updated by mame (Yusuke Endoh) almost 2 years ago
- Assignee deleted (
make_now_just (Hiroya Fujinami))
I wonder if this is due to #18294, not #19104. @jeremyevans0 (Jeremy Evans) What do you think?
Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago
mame (Yusuke Endoh) wrote in #note-5:
I wonder if this is due to #18294, not #19104. @jeremyevans0 (Jeremy Evans) What do you think?
I agree. #18294 doesn't handle /(?-x:...)/
inside an extended regular expression as non-extended syntax. I'll see if I can fix it today.
Updated by jeremyevans0 (Jeremy Evans) almost 2 years ago
jeremyevans0 (Jeremy Evans) wrote in #note-6:
mame (Yusuke Endoh) wrote in #note-5:
I wonder if this is due to #18294, not #19104. @jeremyevans0 (Jeremy Evans) What do you think?
I agree. #18294 doesn't handle
/(?-x:...)/
inside an extended regular expression as non-extended syntax. I'll see if I can fix it today.
Should be fixed by https://github.com/ruby/ruby/pull/7192
Updated by jeremyevans (Jeremy Evans) almost 2 years ago
- Status changed from Open to Closed
Applied in changeset git|eccfc978fd6f65332eb70c9a46fbb4d5110bbe0a.
Fix parsing of regexps that toggle extended mode on/off inside regexp
This was broken in ec3542229b29ec93062e9d90e877ea29d3c19472. That commit
didn't handle cases where extended mode was turned on/off inside the
regexp. There are two ways to turn extended mode on/off:
/(?-x:#y)#z
/x =~ '#y'
/(?-x)#y(?x)#z
/x =~ '#y'
These can be nested inside the same regexp:
/(?-x:(?x)#x
(?-x)#y)#z
/x =~ '#y'
As you can probably imagine, this makes handling these regexps
somewhat complex. Due to the nesting inside portions of regexps,
the unassign_nonascii function needs to be recursive. In
recursive mode, it needs to track both opening and closing
parentheses, similar to how it already tracked opening and
closing brackets for character classes.
When scanning the regexp and coming to (?
not followed by #
,
scan for options, and use x
and i
to determine whether to
turn on or off extended mode. For :
, indicting only the
current regexp section should have the extended mode
switched, recurse with the extended mode set or unset. For )
,
indicating the remainder of the regexp (or current regexp portion
if already recursing) should turn extended mode on or off, just
change the extended mode flag and keep scanning.
While testing this, I noticed that a
, d
, and u
are accepted
as options, in addition to i
, m
, and x
, but I can't see
where those options are documented. I'm not sure whether or not
handling a
, d
, and u
as options is a bug.
Fixes [Bug #19379]
Updated by naruse (Yui NARUSE) almost 2 years ago
- Backport changed from 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: REQUIRED to 2.7: DONTNEED, 3.0: DONTNEED, 3.1: DONTNEED, 3.2: DONE
ruby_3_2 ca75332f46c39804e06cd37c2608cbdef0aebf05 merged revision(s) eccfc978fd6f65332eb70c9a46fbb4d5110bbe0a.