Bug #20504
open
Interpolated string literal in regexp encoding handling
Added by kddnewton (Kevin Newton) 6 months ago.
Updated 6 months ago.
Description
There is some very odd behavior that I'm not sure is intentional or not, so I'm looking for guidance. In here:
# encoding: us-ascii
interp = "\x80"
regexp = /#{interp}/
the regexp
variable is a ascii-8bit regular expression with the byte interpolated into the middle. However, if you inline that interpolation:
# encoding: us-ascii
regexp = /#{"\x80"}/
you get a syntax error, saying it's an invalid multi-byte character. I'm not sure what the rule is here, as it seems inconsistent. Is this the correct behavior?
I would prefer if it would create an ascii-8bit regular expression like the first example, which would be consistent.
Agreed, the current behavior breaks referential transparency and unexpectedly analyzes string literals inside interpolated parts.
This leads to extra confusion and I would think has no value in real-world usages of interpolated regexps (because it causes an error instead of none).
So I think this is a bug and the implementation should not analyze those parts and consequently the behavior should be the same as with the extra local variable.
- Tracker changed from Misc to Bug
- Backport set to 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
I'm fine with it analyzing the string literals, I would just prefer it take the same codepath as the interpolated variable case, in which it would produce an ascii-8bit regular expression as opposed to raising an error.
Discussed at the dev meeting, and @matz (Yukihiro Matsumoto) said /#{"\x80"}/
should not raise a SyntaxError but return a binary encoded regexp object.
Also available in: Atom
PDF
Like0
Like0Like0Like0Like0