radicle_signals::handler should be async-signal-safe, as
implied by the POSIX [1] [2] and C (C17 7.14.1.1.5)
requirements. (On Windows, where asynchronously-generated
signals don’t exist (IIUC), this isn’t relevant.)
At issue in radicle_signals::handler are its use of:
std::sync::Mutex::try_lock, and itsMutexGuarddropping that does the separate unlocking operation.crossbeam_channel::Sender::try_send.
Rationale
Neither of those are specified as being async-signal-safe. While their current internal implementations probably are on Linux (due to Linux futexes being used internally for both), this should not be assumed, because it should be portable to other OSs where whatever is used internally might not be async-signal-safe.
Rust’s std::sync::Mutex makes no guarantees about what it
uses internally nor about whether it’s async-signal-safe,
and so its implementation for some POSIX OSs might not be
async-signal-safe. E.g. the pthread_mutex_trylock of Mac
OS X is not async-signal-safe, apparently [3], and POSIX
does not consider any of the Pthreads mutex functions to be
async-signal-safe, and I think it would not be unreasonable
for an impl of std::sync::Mutex to be based on Pthreads or
the like (even though it often isn’t).
Similarly for crossbeam_channel (and most other types of
channels from other crates), I don’t think its Sender
guarantees whether its internal implementation for all OSs
will always be async-signal-safe.
Even on Linux, I don’t think it should be assumed that higher-level APIs, that are not included among those that are explicitly listed as async-signal-safe, will always use futexes and only futexes.
The recommended practice is that the only functions a signal handler should use (including transitively) are those that are explicitly listed as async-signal-safe. Unfortunately, the functionalities those offer are extremely limited (due to the nature of signal delivery interrupting arbitrary threads during arbitrary functions at arbitrary instructions).
(I happened to be already refreshed on signal handling, a few weeks ago, before I knew about Radicle, and then I wanted to do more research for this issue.)
Objective
The primary challenge is how a blocked thread, that wants to consume notifications of incoming signals, can be woken from within an extremely-limited signal handler.
It’s not absolutely necessary to be able to send the notifications via a channel to the other consuming thread, because atomic variables can be used instead from within a signal handler to record the receipt of signals, and the other thread can access those to find out which signals were received.
Options to change radicle_signals::handler:
-
Use the
signal_hookcrate. Its iterator over incoming signals looks easy to initialize and use across threads [6], and I think this could work for the “signals” thread ofradicle_node. This crate, and its underlyingsignal_hook_registrycrate, also provide some other abilities that might be of use.I’ll need to become more familiar with
radicle_term, to evaluate whethersignal_hookcould work for its needs, including for its need to uninstall signal handling.signal_hookis entirely async-signal-safe and will remain so. It provides an impressive degree of abilities, for being so limited by async-signal-safety, and it goes to some lengths internally (it has its own unusual “half-lock” and channel types that are carefully designed to be async-signal-safe by only using atomics). It also does the best that can be done to deal with the race conditions that are possible when initializing signal handlers. It seems somewhat mature and has significant users (Tokio being a big one), so it seems reliable into the future. It’s not a heavy dependency, but it’s not the smallest either. Its full suite of abilities might be of use ifradicle_signalsneeds to be able to do additional things in the future. I’ve done an initial reading of some of its source code and it seemed good to me so far. -
Use the classic “self-pipe trick” where
write(), which is async-signal-safe, to a pipe is done from a signal handler, and where blockingread()from the other end of the pipe is done from the other consuming thread to wake it when a signal is received. (This is whatsignal_hook::iteratordoes.)But implementing this by one’s self would require
radicle_signalsto invent its own way of using atomic types, or of encoding in the bytes sent in the pipe, to exfiltrate, from the signal handler, a representation of notification that the other thread would consume when woken. (This is whatsignal_hook::iteratordoes internally for its iterator.) Initializing the “self-pipe trick” is somewhat messier, due to needing to setup the pipes, close-on-exec, and non-blocking writes. (signal_hook::iteratorsets this up properly for you.) -
Use the
sem_postfunction of POSIX, which is guaranteed to be async-signal-safe and is specifically pointed-out by POSIX as being of use from within signal handlers. The other consuming thread would use thesem_waitfunction to block on and be woken from. This is simpler and cleaner than the “self-pipe trick”. But, like that, this would requireradicle_signalsto invent its own way of exfiltrating signal notifications in a purely async-signal-safe way.I recently made a Rust-ified and safe interface to
sem_init,sem_post, &sem_wait(et al) [7] that is simple enough that I think it could be trustworthy soon. I also started a brainstorm implementation, using mysem_safecrate, of signals-receipt exfiltration, where signal notifications are simple atomic counters, one for each signal type (signal number) of interest, and where the signal handler just atomically increments a signal’s count, based on the received signal number, and doessem_post()to wake the other consuming thread, and where the other thread atomically checks the counters to see which need to be dealt with however desired. I had some ideas for trying to make this somewhat ergonomic. I worked on this for my own fun, since I thought I might want such for my own projects, and I’m totally fine if you’re not interested in this. -
Unfortunately, the other classic approach that is often recommended - using
sigwait(orsigwaitinfoorsigtimedwait) from the consuming thread to synchronously receive signals and to block that thread until they’re delivered, while altogether avoiding signal handlers and so avoiding async-signal-safety hassles - would probably be too undesirable.This approach requires “blocking” (masking) all signals (except for those like
SIGSEGVthat correspond to a “computational exception”), i.e. usingpthread_sigmaskto change the signal mask at the start ofmainto be inherited by all threads. The issue is that the signal mask would also be inherited by all child processes by default, and, with Radicle using various subprocesses (the amount of which might increase over time), that signal mask would have to be undone during the setup of each and every subprocess, after thefork()and before theexec()(which can be achieved viastd::os::unix::process::CommandExt::pre_exec).But that seems too error-prone, because, while a helper for always doing this to each
Commandbuilder could be made, using it could be forgotten and that could break the programs being run as subprocesses, because they’d have all signals blocked but they sometimes need to receive some signals and almost no programs are prepared to be started with all signals blocked, and such breakage might only be triggered rarely and would seem quite mysterious and would be hard to see what to debug.(If signal masks weren’t inherited across
exec(), then thesigwaitapproach might be ideal.)
If you like, I’m happy to work on any approach you want.
Related
I’ve opened two other issues related to radicle_signals:
b91c6c3“signals: Mutex can cause signals to be missed”c8e0300“signals:libc::signalshould not be used”
Depending on the fix chosen for the first one, the fix might
affect the async-signal-safety of radicle_signals::handler
as detailed here. But the general aspects of
async-signal-safety, as conveyed here, always apply.
This issue is fixed by patch 8e05702.