Radish alpha
h
rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5
Radicle Heartwood Protocol & Stack
Radicle
Git
signals: `handler` is not async-signal-safe
Closed { reason: Solved } did:key:z6Mko5mi...bgMH opened 1 year ago

radicle_signals::handler should be async-signal-safe, as implied by the POSIX [1] [2] and C (C17 7.14.1.1.5) requirements. (On Windows, where asynchronously-generated signals don’t exist (IIUC), this isn’t relevant.)

At issue in radicle_signals::handler are its use of:

  • std::sync::Mutex::try_lock, and its MutexGuard dropping that does the separate unlocking operation.
  • crossbeam_channel::Sender::try_send.

Rationale

Neither of those are specified as being async-signal-safe. While their current internal implementations probably are on Linux (due to Linux futexes being used internally for both), this should not be assumed, because it should be portable to other OSs where whatever is used internally might not be async-signal-safe.

Rust’s std::sync::Mutex makes no guarantees about what it uses internally nor about whether it’s async-signal-safe, and so its implementation for some POSIX OSs might not be async-signal-safe. E.g. the pthread_mutex_trylock of Mac OS X is not async-signal-safe, apparently [3], and POSIX does not consider any of the Pthreads mutex functions to be async-signal-safe, and I think it would not be unreasonable for an impl of std::sync::Mutex to be based on Pthreads or the like (even though it often isn’t).

Similarly for crossbeam_channel (and most other types of channels from other crates), I don’t think its Sender guarantees whether its internal implementation for all OSs will always be async-signal-safe.

Even on Linux, I don’t think it should be assumed that higher-level APIs, that are not included among those that are explicitly listed as async-signal-safe, will always use futexes and only futexes.

The recommended practice is that the only functions a signal handler should use (including transitively) are those that are explicitly listed as async-signal-safe. Unfortunately, the functionalities those offer are extremely limited (due to the nature of signal delivery interrupting arbitrary threads during arbitrary functions at arbitrary instructions).

(I happened to be already refreshed on signal handling, a few weeks ago, before I knew about Radicle, and then I wanted to do more research for this issue.)

Objective

The primary challenge is how a blocked thread, that wants to consume notifications of incoming signals, can be woken from within an extremely-limited signal handler.

It’s not absolutely necessary to be able to send the notifications via a channel to the other consuming thread, because atomic variables can be used instead from within a signal handler to record the receipt of signals, and the other thread can access those to find out which signals were received.

Options to change radicle_signals::handler:

  1. Use the signal_hook crate. Its iterator over incoming signals looks easy to initialize and use across threads [6], and I think this could work for the “signals” thread of radicle_node. This crate, and its underlying signal_hook_registry crate, also provide some other abilities that might be of use.

    I’ll need to become more familiar with radicle_term, to evaluate whether signal_hook could work for its needs, including for its need to uninstall signal handling.

    signal_hook is entirely async-signal-safe and will remain so. It provides an impressive degree of abilities, for being so limited by async-signal-safety, and it goes to some lengths internally (it has its own unusual “half-lock” and channel types that are carefully designed to be async-signal-safe by only using atomics). It also does the best that can be done to deal with the race conditions that are possible when initializing signal handlers. It seems somewhat mature and has significant users (Tokio being a big one), so it seems reliable into the future. It’s not a heavy dependency, but it’s not the smallest either. Its full suite of abilities might be of use if radicle_signals needs to be able to do additional things in the future. I’ve done an initial reading of some of its source code and it seemed good to me so far.

  2. Use the classic “self-pipe trick” where write(), which is async-signal-safe, to a pipe is done from a signal handler, and where blocking read() from the other end of the pipe is done from the other consuming thread to wake it when a signal is received. (This is what signal_hook::iterator does.)

    But implementing this by one’s self would require radicle_signals to invent its own way of using atomic types, or of encoding in the bytes sent in the pipe, to exfiltrate, from the signal handler, a representation of notification that the other thread would consume when woken. (This is what signal_hook::iterator does internally for its iterator.) Initializing the “self-pipe trick” is somewhat messier, due to needing to setup the pipes, close-on-exec, and non-blocking writes. (signal_hook::iterator sets this up properly for you.)

  3. Use the sem_post function of POSIX, which is guaranteed to be async-signal-safe and is specifically pointed-out by POSIX as being of use from within signal handlers. The other consuming thread would use the sem_wait function to block on and be woken from. This is simpler and cleaner than the “self-pipe trick”. But, like that, this would require radicle_signals to invent its own way of exfiltrating signal notifications in a purely async-signal-safe way.

    I recently made a Rust-ified and safe interface to sem_init, sem_post, & sem_wait (et al) [7] that is simple enough that I think it could be trustworthy soon. I also started a brainstorm implementation, using my sem_safe crate, of signals-receipt exfiltration, where signal notifications are simple atomic counters, one for each signal type (signal number) of interest, and where the signal handler just atomically increments a signal’s count, based on the received signal number, and does sem_post() to wake the other consuming thread, and where the other thread atomically checks the counters to see which need to be dealt with however desired. I had some ideas for trying to make this somewhat ergonomic. I worked on this for my own fun, since I thought I might want such for my own projects, and I’m totally fine if you’re not interested in this.

  4. Unfortunately, the other classic approach that is often recommended - using sigwait (or sigwaitinfo or sigtimedwait) from the consuming thread to synchronously receive signals and to block that thread until they’re delivered, while altogether avoiding signal handlers and so avoiding async-signal-safety hassles - would probably be too undesirable.

    This approach requires “blocking” (masking) all signals (except for those like SIGSEGV that correspond to a “computational exception”), i.e. using pthread_sigmask to change the signal mask at the start of main to be inherited by all threads. The issue is that the signal mask would also be inherited by all child processes by default, and, with Radicle using various subprocesses (the amount of which might increase over time), that signal mask would have to be undone during the setup of each and every subprocess, after the fork() and before the exec() (which can be achieved via std::os::unix::process::CommandExt::pre_exec).

    But that seems too error-prone, because, while a helper for always doing this to each Command builder could be made, using it could be forgotten and that could break the programs being run as subprocesses, because they’d have all signals blocked but they sometimes need to receive some signals and almost no programs are prepared to be started with all signals blocked, and such breakage might only be triggered rarely and would seem quite mysterious and would be hard to see what to debug.

    (If signal masks weren’t inherited across exec(), then the sigwait approach might be ideal.)

If you like, I’m happy to work on any approach you want.

Related

I’ve opened two other issues related to radicle_signals:

  • b91c6c3 “signals: Mutex can cause signals to be missed”
  • c8e0300 “signals: libc::signal should not be used”

Depending on the fix chosen for the first one, the fix might affect the async-signal-safety of radicle_signals::handler as detailed here. But the general aspects of async-signal-safety, as conveyed here, always apply.

z6Mko5mi...bgMH commented 1 year ago

This issue is fixed by patch 8e05702.