Project

General

Profile

Actions

Bug #14458

closed

RubyVM::InstructionSequence compilation loses Regexp encoding

Added by dannyfallon (Danny Fallon) about 6 years ago. Updated about 6 years ago.

Status:
Rejected
Assignee:
-
Target version:
-
ruby -v:
ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin16]
[ruby-core:85482]

Description

We appear to be losing encoding information for a Regexp object when we pass it through the compiler:

irb(main):001:0> "Test".encoding
=> #<Encoding:UTF-8>
irb(main):002:0> RubyVM::InstructionSequence.compile("'Test'.encoding").eval
=> #<Encoding:UTF-8>
irb(main):003:0> /\p{Alnum}/.encoding
=> #<Encoding:UTF-8>
irb(main):004:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/.encoding").eval
=> #<Encoding:US-ASCII>

I think the encoding should be retained, much like it is for strings. Adding /u to the Regexp object
does retain the encoding but that feels like a burden we shouldn't have to bear?

irb(main):005:0> RubyVM::InstructionSequence.compile("/\p{Alnum}/u.encoding").eval
=> #<Encoding:UTF-8>
Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0