Project

General

Profile

Actions

Misc #20038

closed

Strings with mixed escapes not detected around interpolation

Added by kddnewton (Kevin Newton) 5 months ago. Updated 5 months ago.

Status:
Closed
Assignee:
-
[ruby-core:115587]

Description

I'm not 100% my understanding is correct, but here's what I think is happening:

When a string is being parsed, it starts out by assuming it's the same encoding as the encoding of the file. If an \x escape sequence is found that has a value >= 0x80, it locks in that encoding. Then if a \u escape sequence is found, it will raise an error if the value is >= 0x80 and say that there are mixed escape sequences. This works for:

# encoding: ascii
"\xFF \u{80}"

The locked in encoding is recalculated on every token, however, so if you do something like:

# encoding: ascii
foo = 1
"\xFF #{foo} \u{80}"

then the first string token (everything up to the interpolation) will be ASCII-8BIT, and everything after the interpolation will be UTF-8. This will not be marked as an error, and will instead be compiled as normal. However, as soon as it is executed, it will raise Encoding::CompatibilityError.

Since this is statically detectable in the parser, it seems incorrect to compile it only for it to blow up later. This could be surprising for people since if it's inside an infrequently-called method, it could end up making its way into production.

Actions

Also available in: Atom PDF

Like0
Like0Like0