Project

General

Profile

Actions

Bug #18337

open

Ruby allows zero-width characters in identifiers

Added by duerst (Martin Dürst) about 3 years ago. Updated about 3 years ago.

Status:
Assigned
Target version:
-
[ruby-core:106056]

Description

Ruby allows zero-width characters in identifiers, which can be shown with the following small test:

irb(main):001:0> script = "ab = 20; a\u200Bb = 30; puts ab;"
=> "ab = 20; a​b = 30; puts ab;"
irb(main):002:0> eval(script)
20
=> nil

The first line creates the script. It contains a zero-width space (ZWSP), but that's not visible in most contexts (see next line). Looking at the script, one expects 30 as an output, but the output is 20 because there are two variables involved, one with a ZWSP and one without. I propose we fix this by disallowing such characters in identifiers. I'll give more details in a followup.


Related issues 1 (0 open1 closed)

Related to Ruby master - Feature #18336: How to deal with Trojan Source vulnerabilityFeedbackActions
Actions #1

Updated by duerst (Martin Dürst) about 3 years ago

  • Related to Feature #18336: How to deal with Trojan Source vulnerability added

Updated by duerst (Martin Dürst) about 3 years ago

The "Trojan source" paper (https://www.trojansource.codes/trojan-source.pdf), in section VII.D, says the following:
"That said, our experimental evidence suggests that this theoretical attack already has defenses employed against it by most modern compilers, and thus is unlikely to work in practice."

My suspicion is that this is because most languages that extend identifier syntax to Unicode do this following Unicode® Standard Annex #31,
Unicode Identifier and Pattern Syntax (https://www.unicode.org/reports/tr31/). Written in Ruby, that document defines identifiers essentially as anything matching /^\p{id_start}\p{id_continue}*$/. It shouldn't be too difficult to do that in Ruby.

Updated by duerst (Martin Dürst) about 3 years ago

  • Status changed from Open to Assigned
  • Assignee set to duerst (Martin Dürst)

As far as I remember the discussion at the recent developers' meeting, we discussed the fact that Ruby currently allows to use unassigned code points in identifiers, and that this was probably being too lose. Also, the fact that Ruby, in contrast to other languages, allows multiple encodings for the source code makes implementing this feature somewhat more difficult. I'll try to create a patch to improve the situation. When such a patch is available, we can discuss again.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0