Project

General

Profile

Actions

Feature #18589

closed

Finer-grained constant invalidation

Added by kddnewton (Kevin Newton) over 2 years ago. Updated about 2 years ago.

Status:
Closed
Assignee:
-
Target version:
-
[ruby-core:107603]

Description

This is related to https://github.com/ruby/ruby/pull/5433.

Current behavior

Caches depend on a global counter. All constant mutations cause all caches to be invalidated.

class A
  B = 1
end

def foo
  A::B # inline cache depends on global counter
end

foo # populate inline cache
foo # hit inline cache

C = 1 # global counter increments, all caches are invalidated

foo # misses inline cache due to `C = 1`

Proposed behavior

Caches depend on name components. Only constant mutations with corresponding names will invalidate the cache.

class A
  B = 1
end

def foo
  A::B # inline cache depends constants named "A" and "B"
end

foo # populate inline cache
foo # hit inline cache

C = 1 # caches that depend on the name "C" are invalidated

foo # hits inline cache because IC only depends on "A" and "B"

Examples of breaking the new cache:

module C
  # Breaks `foo` cache because "A" constant is set and the cache in foo depends
  # on "A" and "B"
  class A; end
end

B = 1

We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.

Performance benchmarks

The following benchmark (included in this pull request) performs about 2x faster than master.

CONSTANT1 = 1
CONSTANT2 = 1
CONSTANT3 = 1
CONSTANT4 = 1
CONSTANT5 = 1

def constants
  [CONSTANT1, CONSTANT2, CONSTANT3, CONSTANT4, CONSTANT5]
end

500_000.times do
  constants
  INVALIDATE = true
end

In terms of macro benchmarks, I ran with this code on railsbench and there was not a statistically significant different in startup time or overall runtime performance.

@byroot (Jean Boussier) also ran performance benchmarks on our production application. He noticed that there were several cache busts related to Object#extend (from core libraries), ActiveRecord::Relation#extending (from Rails), and autoload (from various gems, both internal and external). After a lot of work, the cache busts went down:

Cache bust changes

but they're still frequent enough that it's a problem. These changes had a measurable performance difference in request speed:

Request speed changes

Memory benchmarks

In terms of memory, this includes an increase in VM size by about 500KiB when running on railsbench. This is because we're now tracking cache associations ({ ID => IC[] }) on the VM to know how to invalidate specific caches when constants change.

I booted Shopify's core monolith with this branch as well. It increased proportional to the number of constant caches found in the application. For each constant cache 1 level deep (e.g., Foo) the increase is about 33 bytes. For a constant cache 2 levels deep (e.g., Foo::Bar) the increase is about 67 bytes. The overall increase was around 16Mb or about 1% of the total retained memory.

Updated by ko1 (Koichi Sasada) over 2 years ago

Current design is the global counter doesn't change frequently.
Do you have measurements about it on some apps?

Updated by kddnewton (Kevin Newton) over 2 years ago

At the moment on Shopify's core monolith we're seeing around 1 in 30 requests invalidate the global cache. We're still working out the source of the invalidations. But at the moment with the current design if anything changes anywhere everything is invalidated. So unfortunately it happens quite frequently.

Updated by Eregon (Benoit Daloze) over 2 years ago

During startup, global invalidation for constants also causes a lot of extra lookups, and with a JIT it throws away a lot of code during startup (or the JIT can't inline the value of the constant).

Global per-name constant invalidation is also what JRuby does IIRC.

This is what we did in TruffleRuby, it's per class and constant/method name so it's quite precise (same general approach for method & constant lookup):
https://medium.com/graalvm/precise-method-and-constant-invalidation-in-truffleruby-4dd56c6bac1a
It has the advantage to not invalidate needlessly when e.g. two modules have a constant with the same name.

Updated by byroot (Jean Boussier) over 2 years ago

Amusingly enough, this discussion led me to instrument our production environment to see what is bumping the cache, and one of the big offenders is open-uri: https://github.com/ruby/open-uri/blob/174a8eb7de357fc04c0675dd30073c0218f401a5/lib/open-uri.rb#L415-L417

Another big offender is tzinfo: https://github.com/tzinfo/tzinfo/pull/129

(I'm only mentioning sources that aren't specific to our application).

Here's the modified Ruby I used to find the source of the bumps: https://github.com/Shopify/ruby/commit/7aad79590dd62c05ba3b65d1964dc80f147441b6

Updated by Dan0042 (Daniel DeLorme) over 2 years ago

It increased total retained memory from 1.23Gb to 1.3Gb (about a 0.7% increase).

1.3 / 1.23 is a 5.7% increase, not 0.7%

Updated by byroot (Jean Boussier) over 2 years ago

So I've been digging more on the constant cache busts happening in our app, beside what I already mentioned, what I found is:

A large majority of the busts are caused by autoloaded constants in gems. A few shouldn't autoload at all, but in many case is because the gem provide multiple implementation of an interface, and doesn't want to load them all. So there isn't really a good fix for this. Even the rack gem is fully autoloaded. This means that any web application is bound to invalidate the constant cache on the first few requests it processes.

Then I found some more APIs using Object#extend like open-uri. These are more of a problem, because they bust the cache every time they're called, not just the first time.

And since currently Ruby doesn't offer much API to diagnose this problem, it's unlikely to get better any time soon. So maybe the extra memory usage is worth it?

Updated by byroot (Jean Boussier) over 2 years ago

Do you have measurements about it on some apps?

So here's some metrics from the app I investigated. The metric is not directly RubyVM.stat(:global_constant_state), instead we emit an increment if the global_constant_state changed during a request cycle.

The straight line is the last 24h, and the dotted line is the same days 4 weeks prior:

Constant Caches

The number one cause by far was Object#extend like in open-uri, aws-sdk as well as ActiveRecord::Reation#extending.

The second most common cause was gems using a lot of autoload, or having memoized class attribute like tzinfo.

Overall fixing these had a very significant impact on the service latency

Latency

I'm however worried that this situation will degrade over time, as there's very little way to actively prevent this kind of problems from cropping up.

Actions #9

Updated by kddnewton (Kevin Newton) over 2 years ago

  • Description updated (diff)
Actions #10

Updated by kddnewton (Kevin Newton) over 2 years ago

  • Description updated (diff)
Actions #11

Updated by kddnewton (Kevin Newton) over 2 years ago

  • Description updated (diff)

Updated by kddnewton (Kevin Newton) over 2 years ago

@Dan0042 (Daniel DeLorme) yeah sorry, I was looking at different numbers and got wires crossed.

Actions #13

Updated by kddnewton (Kevin Newton) over 2 years ago

  • Description updated (diff)

Updated by Eregon (Benoit Daloze) over 2 years ago

What's the memory overhead of this? (probably the biggest concern from CRuby's side)

A 5.7% increase does sound like a lot for this.
But it seems the description now says 1% for the monolith?
What's the percentage for railsbench?

Updated by byroot (Jean Boussier) over 2 years ago

@kddeisz is away for a few days, so I'll take the liberty to answer even though he may correct me later.

A 5.7% increase does sound like a lot for this. But it seems the description now says 1% for the monolith?

The initial measurement that was showing the 5.7% increase was flawed, it was actually much less. Sorry about that, it's not that trivial to measure.
Also thanks to @jhawthorn (John Hawthorn) the patch memory usage was reduced even further, hence the ~1%.

Also after chatting a bit yesterday, we believe that in practice this will likely save memory in forking environments, because the finer grained cache will mean less ISeq being written into when a single constant change, so less CoW invalidations. But of course that's heavily dependent on the app.

Updated by jhawthorn (John Hawthorn) over 2 years ago

Tested this patch out on GitHub's largest app and the size of the additional constant cache bookkeeping was only ~3MB (as measured by vm_memsize_constant_cache) for our ~950MB application.

Updated by mame (Yusuke Endoh) over 2 years ago

I'm not against the proposal, but for the record, the change makes Object#extend and Module#include slow.

Before the patch:

$ time ./miniruby -e 'module M; A=B=C=D=E=F=G=H=I=J=K=L=M=N=O=P=Q=R=S=T=U=V=W=X=Y=Z=1; end; 1000000.times { Object.new.extend(M) }'

real    0m1.002s
user    0m0.985s
sys     0m0.016s

After the patch:

$ time ./miniruby -e 'module M; A=B=C=D=E=F=G=H=I=J=K=L=M=N=O=P=Q=R=S=T=U=V=W=X=Y=Z=1; end; 1000000.times { Object.new.extend(M) }'

real    0m1.560s
user    0m1.543s
sys     0m0.016s

After the patch, Object#extend invalidates all constant names defined in the module of the arguments, which takes O(n). I think it's an acceptable trade-off, though.

Updated by matz (Yukihiro Matsumoto) over 2 years ago

I am positive introducing this proposal.

Matz.

Updated by byroot (Jean Boussier) over 2 years ago

I'm not against the proposal, but for the record, the change makes Object#extend and Module#include slow.

I think it's acceptable, because the same code previously would have busted the global constant cache, so it would have made the whole application slower, now to pay the cost upfront, which is positive in my book.

Actions #20

Updated by nobu (Nobuyoshi Nakada) over 2 years ago

  • Status changed from Open to Closed

Applied in changeset git|69967ee64eac9ce65b83533a566d69d12a6046d0.


Revert "Finer-grained inline constant cache invalidation"

This reverts commits for [Feature #18589]:

  • 8008fb7352abc6fba433b99bf20763cf0d4adb38
    "Update formatting per feedback"
  • 8f6eaca2e19828e92ecdb28b0fe693d606a03f96
    "Delete ID from constant cache table if it becomes empty on ISEQ free"
  • 629908586b4bead1103267652f8b96b1083573a8
    "Finer-grained inline constant cache invalidation"

MSWin builds on AppVeyor have been crashing since the merger.

Actions

Also available in: Atom PDF

Like0
Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0Like0