Project

General

Profile

Misc #20406

Updated by andrykonchin (Andrew Konchin) 8 months ago

I am wondering what are the rules to calculate Regexp literal encoding in case an encoding modifier is specified. 


 

 1. 

 From the documentstion: 

 > By default, a regexp with only US-ASCII characters has US-ASCII encoding: 
 > ... 
 > A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers. 
 > //n ... 
 > //u ... 
 > //e ... 
 > //s ... 

 Looking at the following examples I would assume that these rules are followed except one case: 

 ```ruby 
  p /\xc2\xa1/e       .encoding # EUC-JP 
  p /#{ }\xc2\xa1/e .encoding # EUC-JP 

  p /a/e              .encoding # EUC-JP 
  p /a #{} a/e        .encoding # EUC-JP 
  p /#{} a/e          .encoding # US-ASCII 
 ``` 

 The last Regexp `/#{} a/e` is supposed to have `EUC-JP` encoding but has `US-ASCII`. So I am wondering what rule is applied in this case. 

 2. 

 In case of interpolated Regexp with encoding modifier I suppose there is no any encoding negotiation as far as if a Regexp fragment's encoding doesn't match a fixed encoding a SyntaxError is raised. Is this assumption correct? 

 ```ruby 
 /#{ "фв" } a/e 
 # regexp encoding option 'e' differs from source encoding 'UTF-8' (SyntaxError) 
 ```

Back