Feature #19790: Optionally write Ruby crash reports into a file rather than STDERR - Ruby - Ruby Issue Tracking System

Actions

Copy link

Feature #19790

closed

Optionally write Ruby crash reports into a file rather than STDERR

Feature #19790: Optionally write Ruby crash reports into a file rather than STDERR

Added by byroot (Jean Boussier) over 2 years ago. Updated over 2 years ago.

Status:

Closed

Assignee:

Target version:

[ruby-core:114319]

Description

Use case¶

On our servers we set /proc/sys/kernel/core_pattern to point to a small utility that report all the crashes happening in production with the associated core dump into BugSnag.

This allowed us to find and fix many Ruby and native extensions bugs in the last few years.

However, these are hard to triage, so we'd like to augment these crash reports with the output of rb_vm_bugreport().

Problem¶

rb_vm_bugreport() is hard coded to print to STDERR, this makes it hard to extract and parse the report in a production environment, as very often STDERR is shared with other processes, so the crash report is intertwined with logs from other processes.

Feature Request¶

It would be very useful if Ruby could write the crash report to an arbitrary path rather than STDERR, akin to kernel/core_pattern.

Especially it would be useful if it supported interpolating the crashing process PID with %p like kernel/core_pattern, as it would make it easier to map that report with the core file.

This could be controller by an environment variable such as RUBY_BUGREPORT_PATH. e.g.

RUBY_BUGREPORT_PATH=/var/log/ruby/ruby-crash-pid-%p.log

Optional Features¶

kernel/core_pattern supports other interpolations, however not all of them would make sense for Ruby to support.

       %%  A single % character.
       %c  Core file size soft resource limit of crashing process
           (since Linux 2.6.24).
       %d  Dump mode—same as value returned by prctl(2)
           PR_GET_DUMPABLE (since Linux 3.7).
       %e  The process or thread's comm value, which typically is
           the same as the executable filename (without path prefix,
           and truncated to a maximum of 15 characters), but may
           have been modified to be something different; see the
           discussion of /proc/pid/comm and /proc/pid/task/tid/comm
           in proc(5).
       %E  Pathname of executable, with slashes ('/') replaced by
           exclamation marks ('!') (since Linux 3.0).
       %g  Numeric real GID of dumped process.
       %h  Hostname (same as nodename returned by uname(2)).
       %i  TID of thread that triggered core dump, as seen in the
           PID namespace in which the thread resides (since Linux
           3.18).
       %I  TID of thread that triggered core dump, as seen in the
           initial PID namespace (since Linux 3.18).
       %p  PID of dumped process, as seen in the PID namespace in
           which the process resides.
       %P  PID of dumped process, as seen in the initial PID
           namespace (since Linux 3.12).
       %s  Number of signal causing dump.
       %t  Time of dump, expressed as seconds since the Epoch,
           1970-01-01 00:00:00 +0000 (UTC).
       %u  Numeric real UID of dumped process.

Additionally, if kernel/core_pattern starts with a pipe (|), then it allows to pipe the core dump to another program, this may also make sense as a feature.

Prior Art¶

Aside from kernel/core_pattern, some other virtual machine have a similar feature, for instance the JVM has a configurable crash report:

-XX:ErrorFile=/var/log/java/hs_err_pid%p.log

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#1 [ruby-core:114347]

https://github.com/nobu/ruby/tree/bugreport_path

Implemented %%, %e, %E, %f, %F, %p, %t and %NNN(octal).
When invoking pipe, the string is split with white spaces first without quoting, %040 would be useful to represent a space character, especially on Windows.

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#2 [ruby-core:114350]

Thank you @nobu (Nobuyoshi Nakada), I wasn't expecting someone to implement it :)

I take it that you are supportive of the feature?

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#3 [ruby-core:114351]

byroot (Jean Boussier) wrote in #note-2:

Thank you @nobu (Nobuyoshi Nakada), I wasn't expecting someone to implement it :)

I take it that you are supportive of the feature?

I'm positive, but also afraid if this can work well in a signal handler.

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#4 [ruby-core:114352]

Note that the branch includes a testing feature, and is not ready to merge as-is.

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#5 [ruby-core:114353]

afraid if this can work well in a signal handler.

The current bugreporter does lots of things which are not really safe to do in a signal handler... even fprintf is not async-signal-safe (at least on Linux, according to signal-safety(7)). It also accesses the Ruby stack which of course may be in an inconsistent state. I guess it all works "well enough" most of the time.

Looking at your patch, it looks about as signal-safe as the current implementation is. Notably the sprintf calls are unsafe (but we're already doing that), and I'm not sure if the RSTRING_PTR etc manipulations are safe (but we do things like that in the Ruby backtrace-printing code anyway). Nothing else jumps out to my eyes as notably more hazardous. The only thing I might suggest is to push back the opening of the bugreport file (and parsing the %'s) until after the if (crashing) check in rb_vm_bugreport, so that these unsafety's can't cause recursive crashes.

(Also - unrelated to the issue at hand, but I think the crashing = true assignment might strictly speaking need a memory fence after it)

If we wanted Ruby crashdumps to be signal-safe, I guess the best option would be to simply ask the operating system to generate coredumps on crashes, and write some external tooling to do nice things like extract Ruby backtraces out of those coredumps. However that code would be quite platform-specific, and also duplicate decent amounts of Ruby's string/backtrace handling code, so I can see why this hasn't been the approach so far. Not to mention that dealing with coredump files is kind of radioactive because they contain whatever potentially sensitive data that the application was processing at the time.

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#6 [ruby-core:114354]

also afraid if this can work well in a signal handler.

Yeah, as @kjtsanaktsidis (KJ Tsanaktsidis) said, in theory the current one already use things that aren't async signal safe, so I don't think it really change much here.

open(2) should be async-signal safe according to https://man7.org/linux/man-pages/man7/signal-safety.7.html, so unless I'm missing something that's the only real addition here.

Updated by ko1 (Koichi Sasada) over 2 years ago Actions
Copy link
#7 [ruby-core:114436]

Does it print to both STDERR and a file? or only to a file?

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#8 [ruby-core:114437]

I would say only a file if the environment variable is set, with perhaps just a message on stderr saying the report was written into that path.

But I don't have a strong opinion about this, if others think it should be written in both, that's workable for me, I just personally don't think it has value.

Updated by matz (Yukihiro Matsumoto) over 2 years ago Actions
Copy link
#9 [ruby-core:114482]

I accept this proposal, at least we can experiment.
I am a bit worried about calling open(2) in other functions/system calls from the signal handler.

Matz.

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#10 [ruby-core:114483]

I accept this proposal

Thank you!

I am a bit worried about calling open(2) in other functions/system calls from the signal handler.

Unless I'm misreading it, POSIX says open(2) should be async signal safe: https://man7.org/linux/man-pages/man7/signal-safety.7.html so we should be good.

Either way, I think the crash reporter is understood as being "best effort", in rare cases, it can already happens that generating it cause a recursive crash, as long as it works in the vast majority of cases, it's not the end of the world. At this stage all bets are off anyway.

@nobu (Nobuyoshi Nakada) do you wish to finish your branch? Let me know if I can help in any way.

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#11 [ruby-core:114593]

A note from the dev meeting logs, the env var that was accepted is RUBY_CRASH_REPORT (not RUBY_BUGREPORT_PATH).

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#12 [ruby-core:114701]

I've backported @nobu's branch on our 3.2 rubies and it's been working very well. Not too sure what is missing.

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#13

Status changed from Open to Closed

Applied in changeset git|bab01d284c86b39952baaddb15901fd4a15e4151.

[Feature #19790] Rename BUGREPORT_PATH as CRASH_REPORT

Actions

Copy link

Also available in: PDF Atom

Project

General

Profile

Ruby

Custom queries

Feature #19790

Optionally write Ruby crash reports into a file rather than STDERR

Use case¶

Problem¶

Feature Request¶

Optional Features¶

Prior Art¶

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#1 [ruby-core:114347]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#2 [ruby-core:114350]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#3 [ruby-core:114351]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#4 [ruby-core:114352]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#5 [ruby-core:114353]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#6 [ruby-core:114354]

Updated by ko1 (Koichi Sasada) over 2 years ago Actions
Copy link
#7 [ruby-core:114436]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#8 [ruby-core:114437]

Updated by matz (Yukihiro Matsumoto) over 2 years ago Actions
Copy link
#9 [ruby-core:114482]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#10 [ruby-core:114483]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#11 [ruby-core:114593]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#12 [ruby-core:114701]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#13

Project

General

Profile

Ruby

Custom queries

Feature #19790

Optionally write Ruby crash reports into a file rather than STDERR

Use case¶

Problem¶

Feature Request¶

Optional Features¶

Prior Art¶

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #1 [ruby-core:114347]

Updated by byroot (Jean Boussier) over 2 years ago ActionsCopy link #2 [ruby-core:114350]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #3 [ruby-core:114351]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #4 [ruby-core:114352]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago ActionsCopy link #5 [ruby-core:114353]

Updated by byroot (Jean Boussier) over 2 years ago ActionsCopy link #6 [ruby-core:114354]

Updated by ko1 (Koichi Sasada) over 2 years ago ActionsCopy link #7 [ruby-core:114436]

Updated by byroot (Jean Boussier) over 2 years ago ActionsCopy link #8 [ruby-core:114437]

Updated by matz (Yukihiro Matsumoto) over 2 years ago ActionsCopy link #9 [ruby-core:114482]

Updated by byroot (Jean Boussier) over 2 years ago ActionsCopy link #10 [ruby-core:114483]

Updated by byroot (Jean Boussier) over 2 years ago ActionsCopy link #11 [ruby-core:114593]

Updated by byroot (Jean Boussier) over 2 years ago ActionsCopy link #12 [ruby-core:114701]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago ActionsCopy link #13

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#1 [ruby-core:114347]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#2 [ruby-core:114350]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#3 [ruby-core:114351]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#4 [ruby-core:114352]

Updated by kjtsanaktsidis (KJ Tsanaktsidis) over 2 years ago Actions
Copy link
#5 [ruby-core:114353]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#6 [ruby-core:114354]

Updated by ko1 (Koichi Sasada) over 2 years ago Actions
Copy link
#7 [ruby-core:114436]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#8 [ruby-core:114437]

Updated by matz (Yukihiro Matsumoto) over 2 years ago Actions
Copy link
#9 [ruby-core:114482]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#10 [ruby-core:114483]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#11 [ruby-core:114593]

Updated by byroot (Jean Boussier) over 2 years ago Actions
Copy link
#12 [ruby-core:114701]

Updated by nobu (Nobuyoshi Nakada) over 2 years ago Actions
Copy link
#13