Introduction

This document explains the purpose of the Radicle native CI component, the requirements put on it, and its software architecture.

Overview

CI support in Radicle consists of several components. For native CI they are:

the Radicle node
the CI broker
the native CI executable

These all have to run on the same host: the node and broker communicate via a Unix domain socket, and the broker spawns the native CI executable.

See the CI broker architecture documentation for a more in-depth description of CI in Radicle.

The child process is called “the CI adapter” in this document.

Native CI works like this:

reads a request message from its standard input
writes a response message saying it starts a run, to its standard output
clones the git repository in the request
switches to the commit in the request
reads the .radicle/native.yaml file in the repository
executes the shell snippet in the .native.yaml file
writes a response message with the result of the run
writes a log file based on what it did
updates the index.html page that lists all CI runs and their results

Native CI

Sequence diagram for native CI

The diagram above shows the happy path. Various things can go wrong, after the native CI executable has started. (In this document we don’t need to consider other possible failures.) The test suite for native CI verifies that they’re all handled correctly, either by explicitly testing each case, or relying on analysis that generic error handling copes with the case. See the test-suite program in the source tree.

the environment variable specifying the configuration file is not set
can’t read or parse the configuration file
the configuration file does not specify all mandatory fields
the configuration file specifies values that are wrong in some way
stdin is empty
stdin does not contain a newline
the first line of stdin can’t be parsed as a message serialized as JSON
the message is not a trigger message
the repository triggered does not exist
the repository can’t be cloned
the repository does not have the requested commit
the repository does not contain .radicle/native.yaml
native.yaml can’t be read or parsed as YAML
native.yaml does not contain a text field shell
writing first response to stdout fails
there is any problem executing the contents of the shell field using bash
executing the shell snippet takes too long
generating or writing a “run metadata” file fails
writing second response to stdout fails
finding or parsing all run metadata files fails
generating or writing the static web pages listing all runs fails

Requirements

Overall, the native CI engine, or adapter, is very simple. However, it must be robust, which makes things more difficult. Here, robust means that whatever happens, the node owner finds out what it was. If a run fails for whatever reason, the node owner can figure out why. Ideally, this applies to anyone watching CI on the node can see it as well.

In the descriptions of the requirements we use the following roles:

“developer” makes changes to the repository on which CI is run
“node owner” runs the node itself

The native CI engine has several ways to report what it does:

its standard error output
- in a systemd setup this is captured to the system log or journal
a per-node log file for native CI
- this is for this that interest only the node owner, not the developer
- e.g., finding configuration errors that the developer can’t fix, such a missing configuration file
a per-run log file
- this of interest to both the node owner and the developer
- this is the primary tool for the developer to figure out what went wrong in their CI run, so that they can change their repository to fix it

Developer can see what status of each CI run on a node

Requirement: The developer can see what CI runs a node has triggered, and what the current status of each is.

Justification: This lets them be reassured that CI is working.

Implementation: Native CI maintains one or more web pages that list every run. For each run, the following is recorded:

Developer gets a useful run log

Requirement: The developer can fetch a useful log of a run that helps them find out problems in their code.

Justification: This is crucial for the developer to have any hope of fixing a problem found in CI.

Implementation: The run log is a static file that can be fetched via HTTP from the node, or viewed in a web browser. The run log contains at least the following information:

the repository ID
the repository alias, if one is known to the local node
the commit id that triggered the run
the commit diff (git show)
when the run was triggered
when the run finished
the environment variables of the native CI process
every command or other action that was taken during the run
the standard output and standard error output, and the exit code, of every command
whether the run was considered successful or not

Node owner is informed via system log if CI fails early

Requirement: If a native CI run fails early, it writes a message to its standard error output.

Justification: The standard error is captured by systemd, and written to the system log or journal, from where the node owner can be expected to find it. This gives them a chance to find out what’s wrong and hopefully fix it.

“Early” here means any time before the broker has been given a “result” response message, and a per-run log file has been created, and the web page of all CI runs has been updated.

Implementation: Use a suitable Rust logging library, with the default log level allowing only error messages, and only logging an error if something goes wrong early.

Only early failures are logged to the system log

Requirement: The native CI engine only writes to its standard error output when it fails early. Otherwise it only updates its per-run and per-node log files.

Justification: It’s easy to spam the system log with many useless messages, which make it harder to find important information in the log.

The per-node log is updated when an early error occurs

Requirement: If native CI writes an error message to the standard error output, it is also written to the per-node log, with more detail.

Justification: The system log is a bad place to report detailed information, as it’s quite constrained. A per-node log provides more flexibility.

Implementation: Append to a per-node log, and if that fails, report that, too, to the standard error output.

Test architecture

In order to test the native CI engine, we invoke it in various ways, and examine its outputs.

Test setup

In order for the native CI engine to work, it needs to clone from a node. This is awkward for testing. Using a real node is possible, but introduces more moving parts that can fail during tests. Using a test double, or mock, as the node would be possible, but more work, and it’d be somewhat tricky logic, which is likely to introduce bugs.

We implement the test suite to use a specially set up local node, with a repository with contents for tests. We will create the node as part of the test suite so that it has exactly the content we need for the tests.

---
title: Radicle native CI
subtitle: Requirements and architecture
---

# Introduction

This document explains the purpose of the Radicle native CI component,
the requirements put on it, and its software architecture.

# Overview

CI support in Radicle consists of several components. For native CI
they are:

* the Radicle node
* the CI broker
* the native CI executable

These all have to run on the same host: the node and broker
communicate via a Unix domain socket, and the broker spawns the native
CI executable.

See the CI broker architecture documentation for a more in-depth
description of CI in Radicle.

The child process is called "the CI adapter" in this document.

Native CI works like this:

* reads a request message from its standard input
* writes a response message saying it starts a run, to its standard
  output
* clones the git repository in the request
* switches to the commit in the request
* reads the `.radicle/native.yaml` file in the repository
* executes the shell snippet in the `.native.yaml` file
* writes a response message with the result of the run
* writes a log file based on what it did
* updates the `index.html` page that lists all CI runs and their
  results

## Native CI

![Sequence diagram for native CI](architecture.svg)

The diagram above shows the happy path. Various things can go wrong,
after the native CI executable has started. (In this document we don't
need to consider other possible failures.) The test suite for native
CI verifies that they're all handled correctly, either by explicitly
testing each case, or relying on analysis that generic error handling
copes with the case. See the `test-suite` program in the source tree.

* the environment variable specifying the configuration file is not
  set
* can't read or parse the configuration file
* the configuration file does not specify all mandatory fields
* the configuration file specifies values that are wrong in some way
* stdin is empty
* stdin does not contain a newline
* the first line of stdin can't be parsed as a message serialized as
  JSON
* the message is not a trigger message
* the repository triggered does not exist
* the repository can't be cloned
* the repository does not have the requested commit
* the repository does not contain `.radicle/native.yaml`
* `native.yaml` can't be read or parsed as YAML
* `native.yaml` does not contain a text field `shell`
* writing first response to stdout fails
* there is any problem executing the contents of the `shell` field
  using `bash`
* executing the shell snippet takes too long
* generating or writing a "run metadata" file fails
* writing second response to stdout fails
* finding or parsing all run metadata files fails
* generating or writing the static web pages listing all runs fails


# Requirements

Overall, the native CI engine, or adapter, is very simple. However, it
must be robust, which makes things more difficult. Here, robust means
that whatever happens, the node owner finds out what it was. If a run
fails for whatever reason, the node owner can figure out why. Ideally,
this applies to anyone watching CI on the node can see it as well.

In the descriptions of the requirements we use the following roles:

* "developer" makes changes to the repository on which CI is run
* "node owner" runs the node itself

The native CI engine has several ways to report what it does:

* its standard error output
  - in a systemd setup this is captured to the system log or journal
* a per-node log file for native CI
  - this is for this that interest only the node owner, not the
    developer
  - e.g., finding configuration errors that the developer can't fix,
    such a missing configuration file
* a per-run log file
  - this of interest to both the node owner and the developer
  - this is the primary tool for the developer to figure out what went
    wrong in their CI run, so that they can change their repository to
    fix it

## Developer can see what status of each CI run on a node

_Requirement:_ The developer can see what CI runs a node has
triggered, and what the current status of each is.

_Justification:_ This lets them be reassured that CI is working.

_Implementation:_ Native CI maintains one or more web pages that list
every run. For each run, the following is recorded:

## Developer gets a useful run log

_Requirement:_ The developer can fetch a useful log of a run that
helps them find out problems in their code.

_Justification:_ This is crucial for the developer to have any hope of
fixing a problem found in CI.

_Implementation:_ The run log is a static file that can be fetched via
HTTP from the node, or viewed in a web browser. The run log contains
at least the following information:

* the repository ID
* the repository alias, if one is known to the local node
* the commit id that triggered the run
* the commit diff (`git show`)
* when the run was triggered
* when the run finished
* the environment variables of the native CI process
* every command or other action that was taken during the run
* the standard output and standard error output, and the exit code, of
  every command
* whether the run was considered successful or not

## Node owner is informed via system log if CI fails early

_Requirement:_ If a native CI run fails early, it writes a message to
its standard error output.

_Justification:_ The standard error is captured by systemd, and
written to the system log or journal, from where the node owner can be
expected to find it. This gives them a chance to find out what's wrong
and hopefully fix it.

"Early" here means any time before the broker has been given a
"result" response message, and a per-run log file has been created,
and the web page of all CI runs has been updated.

_Implementation:_ Use a suitable Rust logging library, with the
default log level allowing only error messages, and only logging an
error if something goes wrong early.

## Only early failures are logged to the system log

_Requirement:_ The native CI engine only writes to its standard error
output when it fails early. Otherwise it only updates its per-run and
per-node log files.

_Justification:_ It's easy to spam the system log with many useless
messages, which make it harder to find important information in the
log.

## The per-node log is updated when an early error occurs

_Requirement:_ If native CI writes an error message to the standard
error output, it is also written to the per-node log, with more
detail.

_Justification:_ The system log is a bad place to report detailed
information, as it's quite constrained. A per-node log provides more
flexibility.

_Implementation:_ Append to a per-node log, and if that fails, report
that, too, to the standard error output.


# Test architecture

In order to test the native CI engine, we invoke it in various ways,
and examine its outputs.

![Test setup](test.svg)

In order for the native CI engine to work, it needs to clone from a
node. This is awkward for testing. Using a real node is possible, but
introduces more moving parts that can fail during tests. Using a test
double, or mock, as the node would be possible, but more work, and
it'd be somewhat tricky logic, which is likely to introduce bugs.

We implement the test suite to use a specially set up local node, with
a repository with contents for tests. We will create the node as part
of the test suite so that it has exactly the content we need for the
tests.