Project

General

Profile

Actions

Misc #20406

open

Question about Regexp encoding negotiation

Added by andrykonchin (Andrew Konchin) 8 months ago. Updated 8 months ago.

Status:
Open
Assignee:
-
[ruby-core:117408]

Description

I am wondering what are the rules to calculate Regexp literal encoding in case an encoding modifier is specified.

From the documentstion:

By default, a regexp with only US-ASCII characters has US-ASCII encoding:
...
A regular expression containing non-US-ASCII characters is assumed to use the source encoding. This can be overridden with one of the following modifiers.
//n ...
//u ...
//e ...
//s ...

Looking at the following examples I would assume that these rules are followed except one case:

 p /\xc2\xa1/e     .encoding # EUC-JP
 p /#{ }\xc2\xa1/e .encoding # EUC-JP

 p /a/e            .encoding # EUC-JP
 p /a #{} a/e      .encoding # EUC-JP
 p /#{} a/e        .encoding # US-ASCII

The last Regexp /#{} a/e is supposed to have EUC-JP encoding but has US-ASCII. So I am wondering what rule is applied in this case.


Related issues 2 (1 open1 closed)

Related to Ruby master - Misc #20407: Question about applying encoding modifier to an interpolated RegexpClosedActions
Related to Ruby master - Misc #20434: Deprecate encoding-related regular expression modifiersOpenActions
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0