Bug #18449
closedBug in 3.1 regexp literals with \c
Description
This file passes on 2.7, 3.0, and fails (if you remove the skip line) on 3.1:
#!/usr/bin/env ruby -w
require "minitest/autorun"
class TestRegexpCreation < Minitest::Test
  R31 = RUBY_VERSION > "3.1"
  def test_literal_equivalence
    if R31 then
      assert_equal(/\x03/, /\cC/)           # wrong! (note the assert)
    else
      refute_equal(/\x03/, /\cC/)
    end
  end
  def test_from_literal
    re = /\cC/
    assert_equal(/\cC/, re)
    if R31 then
      assert_equal "\\x03", re.source        # wrong?
    else
      assert_equal "\\cC",  re.source
    end
  end
  def test_from_source
    re = Regexp.new "\\cC"
    assert_equal "\\cC", re.source
    if R31 then                                 # wrong!
      skip
      assert_equal(/\cC/, re)                 # can't be written to pass
      assert_equal(/\x03/, re)                # can't be written to pass
    else
      assert_equal(/\cC/, re)
    end
  end
end
# on 3.1:
#
# if written as:
#
#   assert_equal(/\x03/, re)
#
# it fails with:
#
#     1) Failure:
#   TestRegexpCreation#test_source [regexp31.rb:32]:
#   Expected: /\x03/
#     Actual: /\cC/
#
# but if written as:
#
#   assert_equal(/\cC/, re)
#
# it ALSO fails with:
#
#     1) Failure:
#   TestRegexpCreation#test_source [regexp31.rb:32]:
#   Expected: /\x03/
#     Actual: /\cC/
  
        
          
          Updated by mame (Yusuke Endoh) almost 4 years ago
          
          
        
        
      
      - Related to Bug #14367: Wrong interpretation of backslash C in regexp literals added
 
        
          
          Updated by zenspider (Ryan Davis) almost 4 years ago
          
          
        
        
      
      It looks like tokadd_escape has drastically changed and dropped the \c, \M-, and \C- forms...
This isn't mentioned in the release notes, and seems a backwards incompatibility that should be reserved for 4.0: https://www.ruby-lang.org/en/news/2021/12/25/ruby-3-1-0-released/
        
          
          Updated by mame (Yusuke Endoh) almost 4 years ago
          
          
        
        
      
      Looks like \c? in a regexp literal was changed for #14367.
p(/\cC/.source) #=> "\\cC"  in Ruby 3.0
p(/\cC/.source) #=> "\\x03" in Ruby 3.1
@jeremyevans0 (Jeremy Evans) What do you think?
        
          
          Updated by zenspider (Ryan Davis) almost 4 years ago
          
          
        
        
      
      I was just coming back to point at:
Jeremy Evans: Fix handling of control/meta escapes in literal regexps [Wed May 12 12:37:55 2021 -0700 (8 months ago)]
found in https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9
        
          
          Updated by janosch-x (Janosch Müller) almost 4 years ago
          
          
        
        
      
      regexps with these escapes can still be constructed with the Regexp::new constructor, they are only pre-processed to hex escapes in Regexp literals.
/\cC/.source == Regexp.new('\cC').source # false iff Ruby >= 3.1
as the matched codepoints are the same, i'd say this only affects maintainers of parsers (i came across this in regexp_parser), and isn't much of a breaking change to end-users?
        
          
          Updated by jeremyevans0 (Jeremy Evans) almost 4 years ago
          
          
        
        
      
      mame (Yusuke Endoh) wrote in #note-3:
Looks like
\c?in a regexp literal was changed for #14367.p(/\cC/.source) #=> "\\cC" in Ruby 3.0 p(/\cC/.source) #=> "\\x03" in Ruby 3.1@jeremyevans0 (Jeremy Evans) What do you think?
As @janosch-x mentioned, the matched codepoints are the same. The fact that #source returns a different result does not seem like a bug/regression to me.
        
          
          Updated by jeremyevans0 (Jeremy Evans) almost 4 years ago
          
          
        
        
      
      - Status changed from Open to Rejected