Backport #8805
closedRuby GC::Profiler returns incorrect info on Solaris (and relatives)
Description
We use SmartOS as our deployment platform, and noticed when attempting to roll out Ruby 2.0.0-p247 to some SmartOS hosts that garbage collection info is broken with integrations such as New Relic. Investigating further, we found that GC::Profiler.total_time always returns 0.0.
GC::Profiler.enable
=> nil
GC.start
=> nil
GC::Profiler.total_time
=> 0.0
It may be related to this issue: https://bugs.ruby-lang.org/issues/7500 where the mechanism by which the profiler gets timestamps from the OS.
Updated by sax (Eric Saxby) over 11 years ago
I grabbed the c test code from the other ticket, and with some changes got these results:
cat ./timing.c
#include <stdlib.h>
#include <stdio.h>
#include <sys/resource.h>
#include <time.h>
#include <math.h>
double getrusage_time() {
struct rusage usage;
struct timeval time;
getrusage(RUSAGE_SELF, &usage);
time = usage.ru_utime;
return time.tv_sec + time.tv_usec * 1e-6;
}
void print_clock_res(clock_id) {
int rc;
struct timespec res;
rc = clock_getres(clock_id, &res);
if (rc == 0)
printf("clock_res() for %d: %ldns\n", clock_id, res.tv_nsec);
}
double clock_time(clock_id) {
struct timespec ts;
if (clock_gettime(clock_id, &ts) == 0) {
return ts.tv_sec + ts.tv_nsec * 1e-9;
}
return 0.0;
}
void print_clock_time(clock_id) {
int n;
printf("clock_gettime() for %d before: %f\n", clock_id, clock_time(clock_id));
for (n=0; n<10000; n++) pow(2, 2048);
printf("clock_gettime() for %d after: %f\n", clock_id, clock_time(clock_id));
}
int main() {
int n;
printf("getrusage() before: %f\n", getrusage_time());
for (n=0; n<10000; n++) pow(2, 2048);
printf("getrusage() after: %f\n", getrusage_time());
printf("\nclock_gettime() CLOCK_HIGHRES\n");
print_clock_res(CLOCK_HIGHRES);
print_clock_time(CLOCK_HIGHRES);
printf("\nclock_gettime() CLOCK_PROCESS_CPUTIME_ID\n");
print_clock_res(CLOCK_PROCESS_CPUTIME_ID);
print_clock_time(CLOCK_PROCESS_CPUTIME_ID);
printf("\nclock_gettime() CLOCK_THREAD_CPUTIME_ID\n");
print_clock_res(CLOCK_THREAD_CPUTIME_ID);
print_clock_time(CLOCK_THREAD_CPUTIME_ID);
printf("\nclock_gettime() CLOCK_MONOTONIC\n");
print_clock_res(CLOCK_MONOTONIC);
print_clock_time(CLOCK_MONOTONIC);
}
gcc -o timing timing.c -lrt
./timing
getrusage() before: 0.000382
getrusage() after: 0.000432
clock_gettime() CLOCK_HIGHRES
clock_res() for 4: 20ns
clock_gettime() for 4 before: 3140535.103844
clock_gettime() for 4 after: 3140535.103877
clock_gettime() CLOCK_PROCESS_CPUTIME_ID
clock_gettime() for 5 before: 0.000000
clock_gettime() for 5 after: 0.000000
clock_gettime() CLOCK_THREAD_CPUTIME_ID
clock_gettime() for 2 before: 0.000000
clock_gettime() for 2 after: 0.000000
clock_gettime() CLOCK_MONOTONIC
clock_res() for 4: 20ns
clock_gettime() for 4 before: 3140535.103960
clock_gettime() for 4 after: 3140535.103990
Updated by sax (Eric Saxby) over 11 years ago
After more investigation, the clock_gettime calls for CLOCK_PROCESS_CPUTIME_ID return -1, with an errno of 22. I think this maps to: EINVAL The clock_id argument does not specify a known clock.
Sooo I think for SmartOS, and probably for Solaris in general, GC::Profiler should use getrusage() instead of clock_gettime(). I'm a little unclear on the best way to approach this, however. Maybe configure should test for a 0 response to clock_gettime, and set a flag specific to CLOCK_PROCESS_CPUTIME_ID that gc.c can pick up?
Updated by akr (Akira Tanaka) over 11 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r42639.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
- gc.c (getrusage_time): Fallback clock_gettime to getrusage when
clock_gettime fails.
Reported by Eric Saxby. [ruby-core:56762] [Bug #8805]
Updated by akr (Akira Tanaka) over 11 years ago
- Status changed from Closed to Open
- Backport changed from 1.9.3: UNKNOWN, 2.0.0: UNKNOWN to 1.9.3: DONTNEED, 2.0.0: REQUIRED
Updated by sax (Eric Saxby) over 11 years ago
Fantastic! I've rolled out this patch to some SmartOS hosts this morning and it fixes the GC::Profiler total_time reporting.
Thank you so much!
Updated by sax (Eric Saxby) over 11 years ago
Hi Akira,
I have one question. It looks like this has only been applied to trunk, and not to the ruby_2_0_0 branch. I'm not yet totally familiar with the version control workflow of ruby-core, but does that mean that this will be released in 2.1? Will it also be in the next patch release of 2.0?
Thank you!
Updated by akr (Akira Tanaka) over 11 years ago
2013/8/22 sax (Eric Saxby) sax@livinginthepast.org:
I have one question. It looks like this has only been applied to trunk, and not to the ruby_2_0_0 branch. I'm not yet totally familiar with the version control workflow of ruby-core, but does that mean that this will be released in 2.1? Will it also be in the next patch release of 2.0?
I think this should be backported to Ruby 2.0.0.
So I reopened this issue and updated the backport field as:
The Ruby 2.0.0 maintainer (nagachika) may backport the patch.¶
Tanaka Akira
Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago
- Tracker changed from Bug to Backport
- Project changed from Ruby master to Backport200
- Category deleted (
core) - Status changed from Open to Assigned
- Assignee set to nagachika (Tomoyuki Chikanaga)
Updated by nagachika (Tomoyuki Chikanaga) about 11 years ago
- Status changed from Assigned to Closed
This issue was solved with changeset r42653.
Eric, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
merge revision(s) 42639: [Backport #8805]
* gc.c (getrusage_time): Fallback clock_gettime to getrusage when
clock_gettime fails.
Reported by Eric Saxby. [ruby-core:56762] [Bug #8805]
Updated by sax (Eric Saxby) about 11 years ago
Thank you both!