I also think this is a bug. I have changed the category accordingly.
I think we should restrict the characters usable in identifiers to some reasonable ranges. I agree that we mainly want to focus on ASCII programs, but we should do at least a sanity check for the rest of Unicode, and that's clearly not happening now.
As a base for this, it's best to look at Unicode Standard Annex #31, Unicode Identifier And Pattern Syntax (http://www.unicode.org/reports/tr31/). A regular expression for the identifier syntax defined in UAX #31 is easily available in Ruby: /\p{id_start}\p{id_continue}*/
. The character ranges covered by these properties can be checked in enc/unicode/12.1.0/name2ctype.h, from lines 15267 and 15881 (the file is too large for the Web interface to svn).
The only additions we seem to need are '_' in initial position, sigils for the different kinds of identifiers, and final '!', '?', and '=' for method names.
I suspect that it may take @nobu (Nobuyoshi Nakada) just a few hours to actually implement this, and that the backwards-compatibility issues (existing Ruby programs stopping to work) are extremely minimal and limited to examples that show the problem.
I have added this to the list of issues to be discussed at next week's developers' meeting, but I will not be at the meeting itself. If needed, I can join the discussion at the first day of RubyKaigi itself. I have assigned this issue to Matz because I'd like him to give it a sanity check.