Project

General

Profile

Bug #16740

Deprecating and removing the broken Process.clock_getres

Added by Eregon (Benoit Daloze) 2 months ago. Updated about 2 months ago.

Status:
Rejected
Priority:
Normal
Target version:
-
[ruby-core:97609]

Description

clock_getres(2) is incorrect (doesn't match the actual resolution for many clocks), buggy (e.g. can return negative values) on almost all platforms and for almost all clock_id.

This is an observation based running specs for it:
https://github.com/ruby/spec/blob/ec844797a51a017ebc93af833e421362b4b24a17/core/process/clock_getres_spec.rb
https://github.com/ruby/spec/blob/ec844797a51a017ebc93af833e421362b4b24a17/core/process/fixtures/clocks.rb#L19-L59

See how many exceptions there are.
And that's not all, even CLOCK_MONOTONIC on Linux is buggy: https://github.com/ruby/actions/runs/395166997#step:16:155

My conclusion is this API is unusable for any purpose.
If people want to know the resolution for clock_gettime(), they need to call it repetitively and see how precise it is.
I will remove the "matches the clock in practice" specs for clock_getres() as I'm tired to maintain them.
But I think we should remove this impossible-to-use correctly API, because it's fundamentally broken on all platforms.

So I propose to deprecate and remove this API, because I believe it harms more than it helps for Rubyists.

Kernel hackers which want to use that function can always call it via Fiddle/FFI, but it shouldn't be a proper Ruby API since it's so broken at the OS level.


Related issues

Related to Ruby master - Feature #8809: Process.clock_getresClosedActions
#1

Updated by Eregon (Benoit Daloze) 2 months ago

Updated by akr (Akira Tanaka) about 2 months ago

  • Status changed from Open to Rejected

We discussed this issue at a Ruby developer meeting.
https://bugs.ruby-lang.org/issues/16693

We don't want to remove the method.

(1) Removing a method is incompatible.

(2) Even if OS implementation of clock_getres is buggy, it is standardized in SUS and future OS may have better implementation.

Updated by Eregon (Benoit Daloze) about 2 months ago

OK. I'll improve the documentation to mention it's basically unreliable or broken on most operating systems.

Updated by akr (Akira Tanaka) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-3:

OK. I'll improve the documentation to mention it's basically unreliable or broken on most operating systems.

When describing such broken behavior, please describe
(1) OS version and (2) actual behavior
to confirm the behavior in future.

Updated by mame (Yusuke Endoh) about 2 months ago

I don't think that it is a good idea to say "operating system bugs" in rdoc unless there is a solid evidence. You should report it to the upstream and confirm that it is actually a bug. For this particular case, I'm afraid if it is not a fault of OS but a hardware clock.

Updated by Eregon (Benoit Daloze) about 2 months ago

I'm OK to change to "operating system or hardware bugs", although I don't think it's the fault of the hardware.
clock_gettime() works perfectly fine, but clock_getres() on all platforms tested by Ruby is inaccurate for at least some clocks compared to the actual resolution of clock_gettime().
The OS defines most clocks, so I'd think the bug is there.

I think it's hopeless to report upstream given this will never realistically be fixed and nobody seems to care and it's been broken for I would guess many years.

Sorry, I spent too much time trying to spec this completely broken function.

Updated by akr (Akira Tanaka) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-6:

https://github.com/ruby/spec/blob/ec844797a51a017ebc93af833e421362b4b24a17/core/process/fixtures/clocks.rb#L19-L59 and the hundreds of failures in CI are solid evidence.

It doesn't describe OS version and actual problem of OS behavior.

Updated by mame (Yusuke Endoh) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-6:

https://github.com/ruby/spec/blob/ec844797a51a017ebc93af833e421362b4b24a17/core/process/fixtures/clocks.rb#L19-L59 and the hundreds of failures in CI are solid evidence.

Some of the committers suspected the expectation of the spec. The spec assumes that clock_gettime may return a number whose resolution digit is non-zero. But some people said, it might be only guaranteed that the number returned by clock_gettime is a multiple of the resolution. (IMO, this interpretation of "resolution" is too conservative, but some people actually think so.)

We receive many "bug" reports, but many of them are not a bug. Before we call it a "bug" in the official document, it would be good to ask the developers.

Updated by Eregon (Benoit Daloze) about 2 months ago

mame (Yusuke Endoh) wrote in #note-9:

But some people said, it might be only guaranteed that the number returned by clock_gettime is a multiple of the resolution.

Sometimes it returns a resolution less than precise than the clock, but sometimes more precise than the clock, i.e., it's inaccurate both ways.

I don't think it's an issue of the spec, manual testing also confirms it's inaccurate:
https://github.com/ruby/spec/commit/ec48b92efe3f1d234110eefcea7140896219734d#diff-8906d6a3637b6976b884ac506096afc5

And sometimes it's plain buggy and I doubt this can be seen any other way when clock_getres() returns a negative value:
https://rubyci.org/logs/rubyci.s3.amazonaws.com/centos6/ruby-trunk/log/20190428T093004Z.fail.html.gz

I don't have time to report these bugs to all operations systems.
I've spent hours trying to fix that spec and saw 5+ operating systems fail on in total 10+ clocks, that's all the proof I need to say this is completely broken and unusable.

Feel free to tweak the wording in RDoc if you think "operating system bugs" is too strong but I think it's accurate.
I mean it mostly as "not Ruby's bug, but a bug below and OS are inaccurate differently for clock_getres() so it seems most likely a bug of the OS".

Updated by mame (Yusuke Endoh) about 2 months ago

Eregon (Benoit Daloze) wrote in #note-11:

I don't have time to report these bugs to all operations systems.
I've spent hours trying to fix that spec and saw 5+ operating systems fail on in total 10+ clocks, that's all the proof I need to say this is completely broken and unusable.

Feel free to tweak the wording in RDoc if you think "operating system bugs" is too strong but I think it's accurate.
I mean it mostly as "not Ruby's bug, but a bug below and OS are inaccurate differently for clock_getres() so it seems most likely a bug of the OS".

Personally, I like to write nothing about the accuracy if we cannot be responsible.

As akr (Akira Tanaka) said, OS names and behaviors should be elaborated in the note so that we will be able to determine if the underlying API is fixed and to remove the note in future. We cannot do so if only "inaccurate on most platform" is written.

If no document is written at all, we don't have to do nothing. We can consider adding such a note if many people face the accuracy issue in practice and report to Ruby.

Updated by Eregon (Benoit Daloze) about 2 months ago

I don't think a list of ~15 cases of OS+clock would be helpful there, it's basically all OS we test.
And I doubt anyone would ever bother to update that list anyway.
I think clock_getres() will remain broken or inaccurate forever.

Updated by Eregon (Benoit Daloze) about 2 months ago

  • Assignee set to Eregon (Benoit Daloze)

I added an explicit list based on the file linked above in a6f7458ea81e084f6ebe7dc5c8cb5b7cb70fe2be.
On the upside it should be even clearer nobody should use this method.
I consider this issue resolved.

Also available in: Atom PDF