Project

General

Profile

Feature #16254

MRI internal: Define built-in classes in Ruby with `__intrinsic__` syntax

Added by ko1 (Koichi Sasada) 28 days ago. Updated 4 days ago.

Status:
Closed
Priority:
Normal
Target version:
-
[ruby-core:95344]

Description

Abstract

MRI defines most of built-in classes in C with C-APIs like rb_define_method().
However, there are several issues using C-APIs.

A few methods are defined in Ruby written in prelude.rb.
However, we can not define all of classes because we can not touch deep data structure in Ruby.
Furthermore, there are performance issues if we write all of them in Ruby.

To solve this situation, I want to suggest written in Ruby with C intrinsic functions.
This proposal is same as my RubyKaigi 2019 talk https://rubykaigi.org/2019/presentations/ko1.html.

Terminology

  • C-methods: methods defined in C (defined with rb_define_method(), etc).
  • Ruby-methods: methods defined in Ruby.
  • ISeq: The body of RUbyVM::InstructionSequence object which represents bytecode for VM.

Background / Problem / Idea

Written in C

As you MRI developers know, most of methods are written in C with C-APIs.
However, there are several issues.

(1) Annotation issues (compare with Ruby methods)

For example, C-methods defined by C-APIs doesn't have parameters information which are returned by Method#parameters, because there is way to define parameters for C methods.
There are proposals to add parameter name information for C-methods, however, I think it will introduce new complex C-APIs and introduce additional overhead on boot time.

-> Idea; Writing methods in Ruby will solve this issue.

(2) Annotation issues (for further optimization)

It is useful to know the methods attribute, for example, the method causes no side-effect (a pure method).
Labeling all of methods including user program's methods doesn't seem good idea (not Ruby-way). But I think annotating built-in methods is good way because we can manage (and we can remove them when we can make good analyzer).

There are no way to annotate this kind of attributes.

-> Idea: Writing methods in Ruby will make it easy to introduce new annotations.

(3) Performance issue

There are several features which are slower in C than written in Ruby.

  • exception handling (rb_ensure(), etc) because we need to capture context with setjmp on C-methods. Ruby-methods doesn't need to capture any context for exception handling.
  • Passing keyword parameters because Ruby-methods doesn't need to make a Hash object to pass the keyword parameters if they are passed with explicit keyword parameters (foo(k1: v1, k2: v2)).

-> Idea: Writing methods in Ruby makes them faster.

(4) Productivity

It is tough to write some features in C:

For example, it is easy to write rescue syntax in Ruby:

# in Ruby
def dummy_func_rescue
  nil
rescue
  nil
end

But it is difficult to write/read in C:

static VALUE
dummy_body(VALUE self)
{
    return Qnil;
}
static VALUE
dummy_rescue(VALUE self)
{
    return Qnil;
}
static VALUE
tdummy_func_rescue(VALUE self)
{
    return rb_rescue(dummy_body, self, dummy_rescue, self);
}

(trained MRI developer can say it is not tough, though :p)

-> Idea: Writing methods in Ruby makes them easy.

(5) API change

To introduce Guild, I want to pass a "context" parameter (as a first parameter) for each C-functions like mrb_state on mruby.
This is because getting it from TLS (Thread-local-storage) is high-cost operation on dynamic library (libruby).

Maybe nobody allow me to change the specification of functions used by rb_define_method().

-> Idea: But introduce new method definition framework, we can move and change the specification, I hope.
Of course, we can remain current rb_define_method() APIs (with additional cost on Guild available MRI).

Written in Ruby in prelude.rb

There is a file prelude.rb which are loaded at boot time.
This file is used to define several methods, to reduce keyword parameters overhead, for example (IO#read_nonblock, TracePoint#enable).

However, writing all of methods in Ruby is not possible because:

  • (1) feasibility issue (we can not access internal data structure)
  • (2) performance issue (slow in general, of course)
  • (3) atomicity issue (GVL/GIL)

To solve (1), we can provide low-level C-methods to implement high-level (normal built-in) methods. However issues (2) and (3) are not solved.
(From CS researchers perspective, making clever compiler will solve them, like JVM, etc, But we don't have it yet)

-> Idea: Writing method body in C is feasible.

Proposal

(1) Introducing intrinsic mechanism to define built-in methods in Ruby.
(2) Load from binary format to reduce startup time.

(1) Intrinsic function

Calling intrinsic function syntax in Ruby

To define built-in methods, introduce special Ruby syntax __intrinsic__.func(args).
In this case, registered intrinsic function func() is called with args.

In normal Ruby program, __intrinsic__ is a local variable or a method.
However, running on special mode, they are parsed as intrinsic function call.

Intrinsic functions can not be called with:

  • block
  • keyword arguments
  • splat arguments

Development step with intrinsic functions

(1) Write a class/module in Ruby with intrinsic function.

# string.rb
class String
  def length
    __intrinsic__.str_length
  end
end

(2) Implement intrinsic functions

It is almost same as functions used by rb_define_method().
However it will accept context parameter as the first parameter.

(rb_execution_context_t is too long, so we can rename it, rb_state for example)

static VALUE
str_length(rb_execution_context_t *ec, VALUE self)
{
  return LONG2NUM(RSTRING_LEN(self));
}

(3) Define an intrinsic function table and load .rb file with the table.

Init_String(void)
{
  ...
  static const rb_export_intrinsic_t table[] = {
    RB_EXPORT_INTRINSIC(str_length, 0), // 0 is arity
    ...
  };
  rb_vm_builtin_load("string", table);
}

Example

There are two examples:

(1) Comparable module: https://gist.github.com/ko1/7f18e66d1ae25bb30c7e823aa57f0d31
(2) TracePoint class: https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612

(2) Load from binary file with lazy loading

Loading many ".rb" files slows down startup time.

We have ISeq#to_binary method to generate compiled binary data so that we can eliminate parse/compile time.
Fortunately, [Feature #16163] makes binary data small.
Furthermore, enabling "lazy loading" feature improves startup time because we don't need to generate complete ISeqs. USE_LAZY_LOAD in vm_core.h enables this feature.

We need to combine binary. There are several way (convert into C's array, concat with objcopy if available and so on).

Evaluation

Evaluations are written in my RubyKaigi 2019 presentation: https://rubykaigi.org/2019/presentations/ko1.html

Points:

  • Calling overhead of Ruby mehtods with intrinsic functions

    • Normal case, it is almost same as C-methods using optimized VM instructions.
    • With keyword parameters, it is faster than C-methods.
    • With optional parameters, it is x2 slower so it should be solved (*1).
  • Loading overhead

    • Requiring ".rb" files is about x15 slower than defining C methods.
    • Loading binary data with lazy loading technique is about x2 slower than C methods. Not so bad result.
    • At RubyKaigi 2019, the binary data was very huge, but [Feature #16163] reduces the size of binary data.

[*1] Introducing special "overloading" specifier can solve it because we don't need to assign optional parameters. First method lookup can be slowed down, but we can cache the method lookup results (with arity).

# example syntax
overload def foo(a)
  __intrinsic__.foo1(a)
end
overload def foo(a, b)
  __intrinsic__.foo2(a, b)
end

Implementation

Done:

  • Compile calling intrinsic functions (.rb)
  • Exporting intrinsic function table (.c)

Not yet:

  • Loading from binary mechanism
  • Attribute syntax
  • most of built-in class replacement

Now, miniruby and ruby (libruby) load '*.rb' files directly. However, ruby (libruby) should load compiled binary file.

Discussion

Do we rewrite all of built-in classes at once?

No. We can try and migrate them.

Do we support intrinsic mechanism for C-extension libraries?

Maybe in future. Now we can try it on MRI cores.

__intrinsic__ keyword

On my RubyKaigi 2019 talk, I proposed __C__, but I think __intrinsic__ is more descriptive (but a bit long).
Another idea is RubyVM::intrinsic.func(...).

I have no strong opinion. We can change this syntax until we expose this syntax for C-extensions.

Can we support __intrinsic__ in normal Ruby script?

No. This feature is only for built-in features.
As I described, calling intrinsic function syntax has several restriction compare with normal method calls, so that I think they are not exposed as normal Ruby programs, IMO.

Should we maintain intrinsic function table?

Now, yes. And we need to make this table automatically because manual operations can introduce mistake very easily.

Corresponding ".rb" file (trace_point.rb, for example) knows which intrinsic functions are needed.
Parsing ".rb" file can generate the table automatically.
However, we need a latest version Ruby to parse the scripts if they uses syntax which are supported by latest version of Ruby.

For example, we need Ruby 2.7 master to parse a script which uses pattern matching syntax.
However, the system's ruby (BASE_RUBY) should be older version. This is one of bootstrap problem.
This is "chicken-and-egg" problem.

There are several ideas.

(1) Parse a ".c" file to generate a table using function attribute.

INTRINSIC_FUNCTION static VALUE
str_length(...)
...

(2) Build another ruby parser with source code, "parse-ruby".

  • 1. generate parse-ruby with C code.
  • 2. run parse-ruby to generate tables by parsing ".rb" files. This process is written in C.
  • 3. build miniruby and ruby with generated table.

We can make it, but it introduces new complex build process.

(3) Restrict ".rb" syntax

Restrict syntax which can be used by BASE_RUBY for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or ISeq#to_a.

(3) is most easy but not so cool.
(2) is flexible, but it needs implementation cost and increases build complexity.

Path of '*.rb' files and install or not

The path of prelude.rb is <internal:prelude>. We have several options.

  • (1) Don't install ".rb" files and make these path <internal:trace_point.rb>, for example.
  • (2) Install ".rb" and make these paths non-existing paths such as <internal>/installdir/lib/builtin/trace_point.rb.
  • (3) Install ".rb" and make these paths real paths.

We will translate ".rb" files into binary data and link them into ruby (libruby).
So the modification of installed ".rb" files are not affect the behavior. It can introduce confusion so that I wrote (1) and (2).

For (3), it is possible to load ".rb" files if there is modification (maybe detect by modified date) and load from them. But it will introduce an overhead (disk access overhead).

Compatibility issue?

There are several compatibility issues. For example, TracePoint c-call events are changed to call events.
And there are more incompatibles.
We need to check them carefully.

Bootstrap issue?

Yes, there are.

Loading .rb files at boot timing of an interpreter can cause problem.
For example, before initializing String class, the class of String literal is 0 (because String class is not generated).

I introduces several workarounds but we need to modify more.

Conclusion

How about to introduce this mechanism and try it on Ruby 2.7?
We can revert these changes if we found any troubles, if we don't expose this mechanism and only internal changes.

Associated revisions

Revision 46acd007
Added by ko1 (Koichi Sasada) 4 days ago

support builtin features with Ruby and C.

Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]

Several features:

(1) Load .rb file at boottime with native binary.

Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.

(2) __builtin_func() in Ruby call func() written in C.

In Ruby file, we can write __builtin_func() like method call.
However this is not a method call, but special syntax to call
a function func() written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.

Functions (func in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however rb_execution_context_t *ec
is new requirement.

(3) automatic C code generation from .rb files.

tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of _builtin
prefix method calls, and generate a part of C code to export
functions.

tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.

History

Updated by Eregon (Benoit Daloze) 27 days ago

This sounds great.

We have very a similar mechanism in TruffleRuby, inherited from Rubinius, which is called "primitives" (ala Smalltalk).
Compared to Rubinius we changed the syntax and always use the invoke_primitive form (like __intrinsic__ above), not the "try intrinsic, if it fails fallback to the Ruby code below" (Smalltalk-style).

It looks like this:

class WeakRef < Delegator
  def initialize(obj)
    TrufflePrimitive.weakref_set_object(self, obj)
  end
end

(from https://github.com/oracle/truffleruby/blob/44e61173f0661c41dbf9a4c7a229091cf6ab83e3/lib/truffle/weakref.rb
see more examples with https://github.com/oracle/truffleruby/search?q=TrufflePrimitive&unscoped_q=TrufflePrimitive )

We recently changed from Truffle.invoke_primitive(:weakref_set_object, self, obj) to TrufflePrimitive.weakref_set_object(self, obj) because:

  • It's more concise and arguably easier to read.
  • As it's like a normal method call, we can generate stub definitions in IDEs (e.g., in IntelliJ), which allows to jump from Ruby code to the corresponding Java code of TruffleRuby.

Implementation-wise, the code above will generate an AST of the form

MethodNode(name=initialize, args=obj, body=[
  ReadArgumentNode,
  WeakRefSetObjectNode(children=[ReadSelfNode, ReadLocalNode(name=obj)])
])

I.e., the intrinsic node is directly in the AST of the method in TruffleRuby.
Only a fixed number of positional arguments is allowed. That way, there is no code for argument handling, arguments values are just passed directly to the WeakRefSetObjectNode as arguments are simply child nodes.

The main reason I'm detailing this is I think this an opportunity to standardize the syntax for "primitives"/"intrinsics".
No matter the implementation language, it seems the concept of "primitives"/"intrinsics" is universal.

A common syntax for intrinsics/primitives would allow to share Ruby code for core classes using these intrinsics/primitives.
It might look like it's little Ruby code, but I'm confident it will grow.
For instance, I wouldn't be surprised to see some of the argument processing/validation moved to Ruby, as it might just be easier.
At the very least, method definitions (the argument names and their default values) could be shared and avoid duplication.

What do you think?

Updated by Dan0042 (Daniel DeLorme) 27 days ago

There's something I'm not sure I understood so I'd like to clarify if this proposal can be described as

A) write the boilerplate rb_define_class and rb_define_method using a ruby-like macro language;

B) write core classes and methods with the full power of ruby plus a few "invoke C function" macros, which are then compiled to VM instructions and serialized to become part of the binary.

It sounds to me like the proposal is (B) which is really an amazing idea and implementation, but the examples provided for Comparable and TracePoint could be trivially written as (A) so I'm not sure what is the advantage of the heavyweight (B) approach. It would really help to have an example that does more than just wrap the intrinsic function calls.

Updated by naruse (Yui NARUSE) 25 days ago

Dan0042 (Daniel DeLorme) wrote:

It would really help to have an example that does more than just wrap the intrinsic function calls.

Below is an example which solves Problem: Writen in C: (3) Performance issue: keyword parameters.
https://gist.github.com/ko1/969e5690cda6180ed989eb79619ca612#file-trace_point-rb-L195-L197

Updated by ko1 (Koichi Sasada) 4 days ago

Design change:

(1) Table auto generation

Should we maintain intrinsic function table?

I wrote "yes", but I found it is too difficult by human being. So I decide to generate this table by parsing .rb files.

As I wrote:

Restrict syntax which can be used by BASE_RUBY for built-in ".rb" files.
It is easy to list up intrinsic functions using Ripper or AST or ISeq#to_a.

There is this kind of restriction. You can not use pattern matches in .rb files :p

(2) __intrinsic__.func(...) to __builtin_func(...)

Reasons:
(a) similar to gcc's intrinsic format.
(b) easy to introduce special inline pragmra with __builtin_, like __builtin_attribute(:pure) and so on to teach the special information to Ruby interpreter. In this case, __builtin_attribute(:pure) can specify this method is "pure" (no-side effect) and so on.
(c) easy to parse (find out this format) by external tools. Without AST module, it is a bit difficult to parse __intrinsic__.foo() with compiled VM asm. However, AST module was introduced from Ruby 2.6 and the BASERUBY can be more older versions (the oldest version of BASERUBY on rubyci is ruby 2.2). This restriction can be relaxed by making analyzing microruby from source code (microruby is small subset of ruby interpreter to generate miniruby).

Completed code is https://github.com/ruby/ruby/pull/2655
I'll merge it soon.

Updated by ko1 (Koichi Sasada) 4 days ago

Eregon (Benoit Daloze) wrote:

A common syntax for intrinsics/primitives would allow to share Ruby code for core classes using these intrinsics/primitives.
It might look like it's little Ruby code, but I'm confident it will grow.
For instance, I wouldn't be surprised to see some of the argument processing/validation moved to Ruby, as it might just be easier.
At the very least, method definitions (the argument names and their default values) could be shared and avoid duplication.

What do you think?

I understand your concern. But I'm not sure we can share these code because they are "implementation" and depend on backend interpreter.

Anyway, it is very first stage and if this approach becomes mature, we can discuss more again.

#6

Updated by ko1 (Koichi Sasada) 4 days ago

  • Status changed from Open to Closed

Applied in changeset git|46acd0075d80c2f886498f089fde1e9d795d50c4.


support builtin features with Ruby and C.

Support loading builtin features written in Ruby, which implement
with C builtin functions.
[Feature #16254]

Several features:

(1) Load .rb file at boottime with native binary.

Now, prelude.rb is loaded at boottime. However, this file is contained
into the interpreter as a text format and we need to compile it.
This patch contains a feature to load from binary format.

(2) __builtin_func() in Ruby call func() written in C.

In Ruby file, we can write __builtin_func() like method call.
However this is not a method call, but special syntax to call
a function func() written in C. C functions should be defined
in a file (same compile unit) which load this .rb file.

Functions (func in above example) should be defined with
(a) 1st parameter: rb_execution_context_t *ec
(b) rest parameters (0 to 15).
(c) VALUE return type.
This is very similar requirements for functions used by
rb_define_method(), however rb_execution_context_t *ec
is new requirement.

(3) automatic C code generation from .rb files.

tool/mk_builtin_loader.rb creates a C code to load .rb files
needed by miniruby and ruby command. This script is run by
BASERUBY, so *.rb should be written in BASERUBY compatbile
syntax. This script load a .rb file and find all of _builtin
prefix method calls, and generate a part of C code to export
functions.

tool/mk_builtin_binary.rb creates a C code which contains
binary compiled Ruby files needed by ruby command.

Also available in: Atom PDF