Misc #20519: Porting regexp to pure ruby? - Ruby - Ruby Issue Tracking System

Actions

Copy link

Misc #20519

closed

Porting regexp to pure ruby?

Misc #20519: Porting regexp to pure ruby?

Added by brightbits (Michael Baldry) over 1 year ago. Updated over 1 year ago.

Status:

Feedback

Assignee:

[ruby-core:118147]

Description

Would there be any benefit in porting Regexp from Onigmo to a pure ruby implementation that could benefit from YJIT?

Compiling a pattern could be translating to a ruby method which would be optimized by YJIT easily.

Has this been explored or any work done around this kind of thing, before I take a look in to it more?

Many thanks

Updated by shyouhei (Shyouhei Urabe) over 1 year ago Actions
Copy link
#1 [ruby-core:118163]

Status changed from Open to Feedback

Ruby (especially its multilingualized string) is built on top of Onigmo and not vice versa. You must first decouple them, which alone is not an easy task.

Updated by brightbits (Michael Baldry) over 1 year ago Actions
Copy link
#2 [ruby-core:118165]

shyouhei (Shyouhei Urabe) wrote in #note-1:

Ruby (especially its multilingualized string) is built on top of Onigmo and not vice versa. You must first decouple them, which alone is not an easy task.

Ah yes, I see now that everything in enc has an Oniguruma copyright header.

I think that could all remain and just change the actual regexp matching functions but after doing some quick benchmarking with ruby implementing the logic of a relatively simple regexp parsing dates, with YJIT I couldn't get anywhere near the speed of Onigmo.. Which doesn't mean it's not possible, I didn't dig too deep, or do any kind of profiling to see what was taking the time.

The thought came about as my team were benchmarking a change where one suggested a regexp for matching and replacing a string prefix and it was tested against using start_with? and then string range accessor to drop the prefix, which seemed to be faster for that case.

I agree it sounds like a very big job and based on initial testing, unlikely to be an improvement in most cases.

Updated by kddnewton (Kevin Newton) over 1 year ago Actions
Copy link
#3 [ruby-core:118239]

Hi @brightbits! I've investigated this one at length, and can give some context.

As you already discovered, Onigmo stretches well beyond regular expressions. It also provides all of the encoding support within CRuby, stretching all of the way into the parser. This has led most other Ruby implementations to have to vendor Onigmo in order to match behavior 1:1. For example TruffleRuby uses it as a fallback (https://github.com/oracle/truffleruby/blob/master/lib/cext/include/ruby/onigmo.h), Artichoke uses it as a fallback (https://github.com/artichoke/artichoke/blob/77434156f30188a6e27f321b9b0f8437acfc0834/spinoso-regexp/Cargo.toml#L27), Natalie uses it as its regexp engine (https://github.com/natalie-lang/natalie/blob/556e8c195423daddf1c5aba49bb67dda22fb36d7/Rakefile#L467-L480), etc. For these reasons replacing Onigmo entirely may be possible, but it would certainly be an extremely long and arduous process because of concerns about backward compatibility.

That being said, there are things that could be done. The various options would be:

What you already mentioned about handling subsets of regular expressions and splitting them up/enhancing them with additional APIs. You could do this today with ISEQ translation. (Check out https://github.com/k0kubun/ruby-jit-challenge for an intro to how this could work.)
You could interpret the Onigmo bytecode in Ruby directly and attempt to work with YJIT to get performance up. Check out a couple of links here: https://speakerdeck.com/makenowjust/rubykaigi-2024-make-your-own-regex-engine and https://github.com/Shopify/onigmo.
You could rewrite it entirely in Ruby (https://github.com/kddnewton/exreg). The only real way this matches up with performance would be having its own JIT. Certainly possible, but difficult.

Updated by brightbits (Michael Baldry) over 1 year ago Actions
Copy link
#4 [ruby-core:118245]

I was at the kaigi but unfortunately missed that talk! I didn't realise a few weeks later I'd be digging in to it :) Looks like some interesting work has gone in to this area already. I'm going to spend some time looking in to this.

Thanks for the detailed response, I really appreciate it!

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Tags

Custom queries

Misc #20519

Porting regexp to pure ruby?

Updated by shyouhei (Shyouhei Urabe) over 1 year ago Actions
Copy link
#1 [ruby-core:118163]

Updated by brightbits (Michael Baldry) over 1 year ago Actions
Copy link
#2 [ruby-core:118165]

Updated by kddnewton (Kevin Newton) over 1 year ago Actions
Copy link
#3 [ruby-core:118239]

Updated by brightbits (Michael Baldry) over 1 year ago Actions
Copy link
#4 [ruby-core:118245]

Project

General

Profile

Ruby

Tags

Custom queries

Misc #20519

Porting regexp to pure ruby?

Updated by shyouhei (Shyouhei Urabe) over 1 year ago ActionsCopy link #1 [ruby-core:118163]

Updated by brightbits (Michael Baldry) over 1 year ago ActionsCopy link #2 [ruby-core:118165]

Updated by kddnewton (Kevin Newton) over 1 year ago ActionsCopy link #3 [ruby-core:118239]

Updated by brightbits (Michael Baldry) over 1 year ago ActionsCopy link #4 [ruby-core:118245]

Updated by shyouhei (Shyouhei Urabe) over 1 year ago Actions
Copy link
#1 [ruby-core:118163]

Updated by brightbits (Michael Baldry) over 1 year ago Actions
Copy link
#2 [ruby-core:118165]

Updated by kddnewton (Kevin Newton) over 1 year ago Actions
Copy link
#3 [ruby-core:118239]

Updated by brightbits (Michael Baldry) over 1 year ago Actions
Copy link
#4 [ruby-core:118245]