Bug #18449
closedBug in 3.1 regexp literals with \c
Description
This file passes on 2.7, 3.0, and fails (if you remove the skip
line) on 3.1:
#!/usr/bin/env ruby -w
require "minitest/autorun"
class TestRegexpCreation < Minitest::Test
R31 = RUBY_VERSION > "3.1"
def test_literal_equivalence
if R31 then
assert_equal(/\x03/, /\cC/) # wrong! (note the assert)
else
refute_equal(/\x03/, /\cC/)
end
end
def test_from_literal
re = /\cC/
assert_equal(/\cC/, re)
if R31 then
assert_equal "\\x03", re.source # wrong?
else
assert_equal "\\cC", re.source
end
end
def test_from_source
re = Regexp.new "\\cC"
assert_equal "\\cC", re.source
if R31 then # wrong!
skip
assert_equal(/\cC/, re) # can't be written to pass
assert_equal(/\x03/, re) # can't be written to pass
else
assert_equal(/\cC/, re)
end
end
end
# on 3.1:
#
# if written as:
#
# assert_equal(/\x03/, re)
#
# it fails with:
#
# 1) Failure:
# TestRegexpCreation#test_source [regexp31.rb:32]:
# Expected: /\x03/
# Actual: /\cC/
#
# but if written as:
#
# assert_equal(/\cC/, re)
#
# it ALSO fails with:
#
# 1) Failure:
# TestRegexpCreation#test_source [regexp31.rb:32]:
# Expected: /\x03/
# Actual: /\cC/
Updated by mame (Yusuke Endoh) about 3 years ago
- Related to Bug #14367: Wrong interpretation of backslash C in regexp literals added
Updated by zenspider (Ryan Davis) about 3 years ago
It looks like tokadd_escape
has drastically changed and dropped the \c
, \M-
, and \C-
forms...
This isn't mentioned in the release notes, and seems a backwards incompatibility that should be reserved for 4.0: https://www.ruby-lang.org/en/news/2021/12/25/ruby-3-1-0-released/
Updated by mame (Yusuke Endoh) about 3 years ago
Looks like \c?
in a regexp literal was changed for #14367.
p(/\cC/.source) #=> "\\cC" in Ruby 3.0
p(/\cC/.source) #=> "\\x03" in Ruby 3.1
@jeremyevans0 (Jeremy Evans) What do you think?
Updated by zenspider (Ryan Davis) about 3 years ago
I was just coming back to point at:
Jeremy Evans: Fix handling of control/meta escapes in literal regexps [Wed May 12 12:37:55 2021 -0700 (8 months ago)]
found in https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9
Updated by janosch-x (Janosch Müller) about 3 years ago
regexps with these escapes can still be constructed with the Regexp::new
constructor, they are only pre-processed to hex escapes in Regexp literals.
/\cC/.source == Regexp.new('\cC').source # false iff Ruby >= 3.1
as the matched codepoints are the same, i'd say this only affects maintainers of parsers (i came across this in regexp_parser
), and isn't much of a breaking change to end-users?
Updated by jeremyevans0 (Jeremy Evans) about 3 years ago
mame (Yusuke Endoh) wrote in #note-3:
Looks like
\c?
in a regexp literal was changed for #14367.p(/\cC/.source) #=> "\\cC" in Ruby 3.0 p(/\cC/.source) #=> "\\x03" in Ruby 3.1
@jeremyevans0 (Jeremy Evans) What do you think?
As @janosch-x mentioned, the matched codepoints are the same. The fact that #source returns a different result does not seem like a bug/regression to me.
Updated by jeremyevans0 (Jeremy Evans) about 3 years ago
- Status changed from Open to Rejected