Misc #19767
open[Not really a bug, but more a not ideal notification] "historical binary regexp match" when using the "n" modifier in a ruby regex
Description
To get my knowledge about ruby regexes up-to-date I have been
going through this tutorial/book here at:
https://learnbyexample.github.io/Ruby_Regexp/unicode.html
One example they provide is this, with some odd characters:
'fox:αλεπού'.scan(/\w+/n)
This will match the found word ("fox"), but it also reports
the following warning:
warning: historical binary regexp match /.../n against UTF-8 string
Now: this may be obvious to others, but to me personally I am not
sure what a "historical" binary regexp match actually is. I assume
it may have meant that this was more used in the past, and may be
discouraged now? Or is something else meant? What does "historical"
mean in this context?
I may not be the only one who does not fully understand the term
historical. Most of ruby's warnings are fairly easy to understand,
but this one seems odd. Right now I do not know whether we can use
the "n" modifier in a regex - not that I really have a good use
case for it (I am using UTF-8 these days, so I don't seem to need
ASCII-8BIT anyway), but perhaps the warning could be changed a little.
I have no good alternative suggestion how it can be changed, largely
because I do not know what it actually means, e. g. what is "historical"
about it (but, even then, I'd actually recommend against using the
word "historical" because I don't understand what it means; deprecated
is easy to understand, historical does not tell me anything).
Perhaps it could be expressed somewhat differently and we could get
rid of the word "historical" there? Either way, it's a tiny issue so
I was not even sure whether to report it. But, from the point of view
of other warnings, I believe the term "historical" does not tell the
user enough about what the issue is here.
(irb):1: warning: historical binary regexp match /.../n against UTF-8 string
=> ["fox"]
Updated by jeremyevans0 (Jeremy Evans) over 1 year ago
- Tracker changed from Bug to Misc
- ruby -v deleted (
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]) - Backport deleted (
3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN)
Updated by Dan0042 (Daniel DeLorme) about 1 year ago
The "historical" and "binary" parts were added in 2017
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/d8cee4ff0a851037e96fe76d951a1549284c875a/diff/re.c
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/dbd4c4a7b373061d235857f7f34e15859a7f1051/diff/re.c
The original warning was added in 2008
https://bugs.ruby-lang.org/projects/ruby-master/repository/git/revisions/880a96c795d30d95497cb216c8bfc7fa1b3b5387/diff/re.c
It means that even though it may look like a binary regexp, it doesn't act like one. "é"[/./n] == "é"
, not the first byte of "é"
TBH I don't know why it was done that way. It would be convenient if /.../n =~ str
was equivalent to /.../n =~ str.b
but without the intermediary string.