Project

General

Profile

Actions

Misc #20038

closed

Strings with mixed escapes not detected around interpolation

Added by kddnewton (Kevin Newton) 5 months ago. Updated 4 months ago.

Status:
Closed
Assignee:
-
[ruby-core:115587]

Description

I'm not 100% my understanding is correct, but here's what I think is happening:

When a string is being parsed, it starts out by assuming it's the same encoding as the encoding of the file. If an \x escape sequence is found that has a value >= 0x80, it locks in that encoding. Then if a \u escape sequence is found, it will raise an error if the value is >= 0x80 and say that there are mixed escape sequences. This works for:

# encoding: ascii
"\xFF \u{80}"

The locked in encoding is recalculated on every token, however, so if you do something like:

# encoding: ascii
foo = 1
"\xFF #{foo} \u{80}"

then the first string token (everything up to the interpolation) will be ASCII-8BIT, and everything after the interpolation will be UTF-8. This will not be marked as an error, and will instead be compiled as normal. However, as soon as it is executed, it will raise Encoding::CompatibilityError.

Since this is statically detectable in the parser, it seems incorrect to compile it only for it to blow up later. This could be surprising for people since if it's inside an infrequently-called method, it could end up making its way into production.

Updated by naruse (Yui NARUSE) 4 months ago

  • Status changed from Open to Closed

String interpolation is a syntax sugar of string concatenation. The behavior is intended.

Updated by kddnewton (Kevin Newton) 4 months ago

Thank you for your response. I understand that interpolation is sugar for concatenation. However these strings will always cause an error, but they won't cause an error until runtime. This information is statically known, so I am proposing we detect it in the parser for a better developer experience. We know that it is impossible for this code to run.

Actions

Also available in: Atom PDF

Like0
Like0Like0