Feature #15017
openProvide extended information about Signal
Description
Hi,
I see that ruby already use sigaction for signal handling on linux. It would be really nice to extend it to provide also signal details in siginfo_t and then provide such details in ruby via Signal module (as additional param beside signal number for block). Use case is quite simple. Our ruby application is killed by sigint and we do not know who send this signal. We already catching signal and logging as much as possible, but without siginfo we are very limited. We do not know ff it is OOMKiller due to low memory, systemd or something else. This additional info in siginfo allows us to log a much more details when such signal appear and inspect process that send us this signal. This is useful especially on customer side where we do not have direct control and we get only logs from failed run.
If you need more info or help with example usage, do not hesitate to ask.
Thanks
Josef
Updated by shevegen (Robert A. Heiler) over 6 years ago
Some comments (personal opinion here):
-
I think the described use case is understandable, but if you can, I think it may
help matz and the ruby core team decide on; for example, what is the specific API
you had in mind? Could you give a more explicit code example here? You mentioned it
here already: "additional param beside signal number for block" but I think it may
help if you could give a specific example in ruby code, how it looks and perhaps
an example sentence for documentation (may help whoever will write the code on the
C level side, I think, and also provide an example to the on-line documentation
for ruby+signal). -
I assume you use Linux, like many folks here do; matz uses debian as far as I
know. I myself use mostly slackware, but compile literally almost everything from
source using ruby scripts. Many other people use Windows, OSX/Mac variant or even
less common operating systems (Solaris, the various BSD flavours, HaikuOS and so
forth). There are examples where features have been rejected because they may
only run on some systems but not others. I do not know whether this is the case
in your example; perhaps there are agnostic implementations possible that would
work on every platform. That would be best.
I mention this in particular because you gave the example of systemd. And I
believe if functionality would be only limited to one OS, then it may be better
to have it available as a gem. Note that even in this case, I am sure the ruby
core team and others may be able to help here, e. g. what may be required to
offer the desired functionality.
I would suggest to you to provide a bit more context, if you feel like it; for
example perhaps you may have an idea how to best implement it too, but probably
describing your use case is the single best thing to do; matz and the core team
often said that they prioritize on real problems that ruby users have and the
corresponding use cases (although you may already have described it enough, I
just mention it again because I think it is one of the best ways to get a
change into ruby if you have a valid use case - there are various examples in
the past where this has helped a lot).
And consider adding it for discussion at the next developer meeting:
https://bugs.ruby-lang.org/issues/14981
Matz is very busy, coming to europe soon too \o/ so the developer meetings
are great to help "get the process going". But of course it is your issue,
please feel free to do whatever you want to do. :)
Updated by nobu (Nobuyoshi Nakada) over 6 years ago
shevegen (Robert A. Heiler) wrote:
Some comments (personal opinion here):
- I think the described use case is understandable, but if you can, I think it may
help matz and the ruby core team decide on; for example, what is the specific API
you had in mind? Could you give a more explicit code example here? You mentioned it
here already: "additional param beside signal number for block" but I think it may
help if you could give a specific example in ruby code, how it looks and perhaps
an example sentence for documentation (may help whoever will write the code on the
C level side, I think, and also provide an example to the on-line documentation
for ruby+signal).
Currently, it is:
Signal.trap("INT") {|signo| }
where signo
is an integer.
What I can think of is:
Signal.trap("INT") {|siginfo| }
where siginfo
is an object which has #to_int
method to return the signal number,
and other accessors.
Another idea is:
Signal.trap("INT") {signo, siginfo| }
but this can cause compatibility issues on the block arity.
- I assume you use Linux, like many folks here do; matz uses debian as far as I
know. I myself use mostly slackware, but compile literally almost everything from
source using ruby scripts. Many other people use Windows, OSX/Mac variant or even
less common operating systems (Solaris, the various BSD flavours, HaikuOS and so
forth). There are examples where features have been rejected because they may
only run on some systems but not others. I do not know whether this is the case
in your example; perhaps there are agnostic implementations possible that would
work on every platform. That would be best.
As far as rubyci.org, UNIX-like platforms have sigaction
all, including AIX, FreeBSD, Solaris and macOS.
Updated by normalperson (Eric Wong) over 6 years ago
Feature #15017: Provide extended information about Signal
https://bugs.ruby-lang.org/issues/15017#change-73659
Implementation would be tricky, since we defer signals to run
Ruby code. (Deferring allows us to use malloc and other
non-async-safe C stdlib functions).
To implement, we'd have to reject [Misc #15011] (eventfd)[1] and
write "struct siginfo" to the pipe.
Or, write a async-signal-safe bump allocator...
[1] Too many full-sized pipes cause resource problems requiring ugly
workarounds like r64478.
Updated by fxn (Xavier Noria) over 1 year ago
I have another use case.
Resque sends SIGTERM to terminate jobs (which run by default in a child process). That raises Resque::TermException
in the child, and you can configure Resque to wait a certain amount of seconds before it does that. That allows graceful exits.
On the other hand, Heroku sends SIGTERM to all processes in a dyno being shut down.
Therefore, when you run Resque in Heroku, the child process gets SIGTERM from Heroku and raises. The orderly termination logic provided by Resque gets lost.
There is a gem that monkey patches Resque to install a signal handler that ignores the first SIGTERM. However, that is a partial solution only becasue Heroku documents that dynos may receive multiple SIGTERMs. And I confirm, it does happen. So, you cannot assume the second SIGTERM comes from the Resque worker.
What would be really robust would be to check if the signal came from the parent process.
The comment from @normalperson (Eric Wong) says this could be tricky, so I am not suggesting to reconsider. I'll find an alternative way to address this, only wanted to add a use case to the discussion that does not depend on the operating system.
Updated by ioquatix (Samuel Williams) over 1 year ago
With a passing interest, since I've worked in the guts of this code, I believe the situation you are describing sounds like buggy behaviour on the part of Heroku.
In any case, I don't think there should be an issue to expose more details if they are available in the signal handler, but it might be a tricky interface to be compatible across all systems and the implementation might be difficult, i.e. not all signals are always generated from signal handlers (thinking of signalfd etc).