| + |
---
|
| + |
title: Radicle native CI
|
| + |
subtitle: Requirements and architecture
|
| + |
...
|
| + |
|
| + |
# Introduction
|
| + |
|
| + |
This document explains the purpose of the Radicle native CI component,
|
| + |
the requirements put on it, and its software architecture.
|
| + |
|
| + |
# Overview
|
| + |
|
| + |
CI support in Radicle consists of several components. For native CI
|
| + |
they are:
|
| + |
|
| + |
* the Radicle node
|
| + |
* the CI broker
|
| + |
* the native CI executable
|
| + |
|
| + |
These all have to run on the same host: the node and broker
|
| + |
communicate via a Unix domain socket, and the broker spawns the native
|
| + |
CI executable.
|
| + |
|
| + |
See the CI broker architecture documentation for a more in-depth
|
| + |
description of CI in Radicle.
|
| + |
|
| + |
The child process is called "the CI adapter" in this document.
|
| + |
|
| + |
Native CI works like this:
|
| + |
|
| + |
* reads a request message from its standard input
|
| + |
* clones the git repository in the request
|
| + |
* switches to the commit in the request
|
| + |
* reads the `.radicle/native.yaml` file in the repository
|
| + |
* writes a response message saying it starts a run, to its standard
|
| + |
output
|
| + |
* executes the shell snippet in the `.native.yaml` file
|
| + |
* writes a response message with the result of the run
|
| + |
* writes a log file based on what it did
|
| + |
* updates the `index.html` page that lists all CI runs and their
|
| + |
results
|
| + |
|
| + |
## Native CI
|
| + |
|
| + |

|
| + |
|
| + |
The diagram above shows the happy path. Various things can go wrong,
|
| + |
after the native CI executable has started. (In this document we don't
|
| + |
need to consider other possible failures.) The test suite for native
|
| + |
CI verifies that they're all handled correctly, either by explicitly
|
| + |
testing each case, or relying on analysis that generic error handling
|
| + |
copes with the case. See the `test-suite` program in the source tree.
|
| + |
|
| + |
* the environment variable specifying the configuration file is not
|
| + |
set
|
| + |
* can't read or parse the configuration file
|
| + |
* the configuration file does not specify all mandatory fields
|
| + |
* the configuration file specifies values that are wrong in some way
|
| + |
* stdin is empty
|
| + |
* stdin does not contain a newline
|
| + |
* the first line of stdin can't be parsed as a message serialized as
|
| + |
JSON
|
| + |
* the message is not a trigger message
|
| + |
* the repository triggered does not exist
|
| + |
* the repository can't be cloned
|
| + |
* the repository does not have the requested commit
|
| + |
* the repository does not contain `.radicle/native.yaml`
|
| + |
* `native.yaml` can't be read or parsed as YAML
|
| + |
* `native.yaml` does not contain a text field `shell`
|
| + |
* writing first response to stdout fails
|
| + |
* there is any problem executing the contents of the `shell` field
|
| + |
using `bash`
|
| + |
* executing the shell snippet takes too long
|
| + |
* generating or writing a "run metadata" file fails
|
| + |
* writing second response to stdout fails
|
| + |
* finding or parsing all run metadata files fails
|
| + |
* generating or writing the static web pages listing all runs fails
|
| + |
|
| + |
|
| + |
# Requirements
|
| + |
|
| + |
Overall, the native CI engine, or adapter, is very simple. However, it
|
| + |
must be robust, which makes things more difficult. Here, robust means
|
| + |
that whatever happens, the node owner finds out what it was. If a run
|
| + |
fails for whatever reason, the node owner can figure out why. Ideally,
|
| + |
this applies to anyone watching CI on the node can see it as well.
|
| + |
|
| + |
In the descriptions of the requirements we use the following roles:
|
| + |
|
| + |
* "developer" makes changes to the repository on which CI is run
|
| + |
* "node owner" runs the node itself
|
| + |
|
| + |
The native CI engine has several ways to report what it does:
|
| + |
|
| + |
* its standard error output
|
| + |
- in a systemd setup this is captured to the system log or journal
|
| + |
* a per-node log file for native CI
|
| + |
- this is for this that interest only the node owner, not the
|
| + |
developer
|
| + |
- e.g., finding configuration errors that the developer can't fix,
|
| + |
such a missing configuration file
|
| + |
* a per-run log file
|
| + |
- this of interest to both the node owner and the developer
|
| + |
- this is the primary tool for the developer to figure out what went
|
| + |
wrong in their CI run, so that they can change their repository to
|
| + |
fix it
|
| + |
|
| + |
## Developer can see what status of each CI run on a node
|
| + |
|
| + |
_Requirement:_ The developer can see what CI runs a node has
|
| + |
triggered, and what the current status of each is.
|
| + |
|
| + |
_Justification:_ This lets them be reassured that CI is working.
|
| + |
|
| + |
_Implementation:_ Native CI maintains one or more web pages that list
|
| + |
every run. For each run, the following is recorded:
|
| + |
|
| + |
## Developer gets a useful run log
|
| + |
|
| + |
_Requirement:_ The developer can fetch a useful log of a run that
|
| + |
helps them find out problems in their code.
|
| + |
|
| + |
_Justification:_ This is crucial for the developer to have any hope of
|
| + |
fixing a problem found in CI.
|
| + |
|
| + |
_Implementation:_ The run log is a static file that can be fetched via
|
| + |
HTTP from the node, or viewed in a web browser. The run log contains
|
| + |
at least the following information:
|
| + |
|
| + |
* the repository ID
|
| + |
* the repository alias, if one is known to the local node
|
| + |
* the commit id that triggered the run
|
| + |
* the commit diff (`git show`)
|
| + |
* when the run was triggered
|
| + |
* when the run finished
|
| + |
* the environment variables of the native CI process
|
| + |
* every command or other action that was taken during the run
|
| + |
* the standard output and standard error output, and the exit code, of
|
| + |
every command
|
| + |
* whether the run was considered successful or not
|
| + |
|
| + |
## Node owner is informed via system log if CI fails early
|
| + |
|
| + |
_Requirement:_ If a native CI run fails early, it writes a message to
|
| + |
its standard error output.
|
| + |
|
| + |
_Justification:_ The standard error is captured by systemd, and
|
| + |
written to the system log or journal, from where the node owner can be
|
| + |
expected to find it. This gives them a chance to find out what's wrong
|
| + |
and hopefully fix it.
|
| + |
|
| + |
"Early" here means any time before the broker has been given a
|
| + |
"result" response message, and a per-run log file has been created,
|
| + |
and the web page of all CI runs has been updated.
|
| + |
|
| + |
_Implementation:_ Use a suitable Rust logging library, with the
|
| + |
default log level allowing only error messages, and only logging an
|
| + |
error if something goes wrong early.
|
| + |
|
| + |
## Only early failures are logged to the system log
|
| + |
|
| + |
_Requirement:_ The native CI engine only writes to its standard error
|
| + |
output when it fails early. Otherwise it only updates its per-run and
|
| + |
per-node log files.
|
| + |
|
| + |
_Justification:_ It's easy to spam the system log with many useless
|
| + |
messages, which make it harder to find important information in the
|
| + |
log.
|
| + |
|
| + |
## The per-node log is updated when an early error occurs
|
| + |
|
| + |
_Requirement:_ If native CI writes an error message to the standard
|
| + |
error output, it is also written to the per-node log, with more
|
| + |
detail.
|
| + |
|
| + |
_Justification:_ The system log is a bad place to report detailed
|
| + |
information, as it's quite constrained. A per-node log provides more
|
| + |
flexibility.
|
| + |
|
| + |
_Implementation:_ Append to a per-node log, and if that fails, report
|
| + |
that, too, to the standard error output.
|
| + |
|
| + |
|
| + |
# Test architecture
|
| + |
|
| + |
In order to test the native CI engine, we invoke it in various ways,
|
| + |
and examine its outputs.
|
| + |
|
| + |

|
| + |
|
| + |
In order for the native CI engine to work, it needs to clone from a
|
| + |
node. This is awkward for testing. Using a real node is possible, but
|
| + |
introduces more moving parts that can fail during tests. Using a test
|
| + |
double, or mock, as the node would be possible, but more work, and
|
| + |
it'd be somewhat tricky logic, which is likely to introduce bugs.
|
| + |
|
| + |
We implement the test suite to use a specially set up local node, with
|
| + |
a repository with contents for tests. We will create the node as part
|
| + |
of the test suite so that it has exactly the content we need for the
|
| + |
tests.
|