Radish alpha
r
rad:z3qg5TKmN83afz2fj9z3fQjU8vaYE
Radicle CI adapter for native CI
Radicle
Git
doc: first draft of architecture document
Lars Wirzenius committed 2 years ago
commit ccf794d2a00a837f078d65d0c982a814ad4acbd9
parent e3007d4
5 files changed +253 -0
added doc/.gitignore
@@ -0,0 +1,2 @@
+
*.html
+
*.svg
added doc/Makefile
@@ -0,0 +1,15 @@
+
.SUFFIXES: .uml .svg .pik .md .html
+

+
.md.html:
+
	pandoc --toc --standalone --self-contained $< -o $@
+

+
.uml.svg:
+
	plantuml -tsvg --output=. $<
+

+
.pik.svg:
+
	pikchr-cli $< > $@.tmp
+
	mv $@.tmp $@
+

+
all: architecture.html
+

+
architecture.html: architecture.svg test.svg
added doc/architecture.md
@@ -0,0 +1,200 @@
+
---
+
title: Radicle native CI
+
subtitle: Requirements and architecture
+
...
+

+
# Introduction
+

+
This document explains the purpose of the Radicle native CI component,
+
the requirements put on it, and its software architecture.
+

+
# Overview
+

+
CI support in Radicle consists of several components. For native CI
+
they are:
+

+
* the Radicle node
+
* the CI broker
+
* the native CI executable
+

+
These all have to run on the same host: the node and broker
+
communicate via a Unix domain socket, and the broker spawns the native
+
CI executable.
+

+
See the CI broker architecture documentation for a more in-depth
+
description of CI in Radicle.
+

+
The child process is called "the CI adapter" in this document.
+

+
Native CI works like this:
+

+
* reads a request message from its standard input
+
* clones the git repository in the request
+
* switches to the commit in the request
+
* reads the `.radicle/native.yaml` file in the repository
+
* writes a response message saying it starts a run, to its standard
+
  output
+
* executes the shell snippet in the `.native.yaml` file
+
* writes a response message with the result of the run
+
* writes a log file based on what it did
+
* updates the `index.html` page that lists all CI runs and their
+
  results
+

+
## Native CI
+

+
![Sequence diagram for native CI](architecture.svg)
+

+
The diagram above shows the happy path. Various things can go wrong,
+
after the native CI executable has started. (In this document we don't
+
need to consider other possible failures.) The test suite for native
+
CI verifies that they're all handled correctly, either by explicitly
+
testing each case, or relying on analysis that generic error handling
+
copes with the case. See the `test-suite` program in the source tree.
+

+
* the environment variable specifying the configuration file is not
+
  set
+
* can't read or parse the configuration file
+
* the configuration file does not specify all mandatory fields
+
* the configuration file specifies values that are wrong in some way
+
* stdin is empty
+
* stdin does not contain a newline
+
* the first line of stdin can't be parsed as a message serialized as
+
  JSON
+
* the message is not a trigger message
+
* the repository triggered does not exist
+
* the repository can't be cloned
+
* the repository does not have the requested commit
+
* the repository does not contain `.radicle/native.yaml`
+
* `native.yaml` can't be read or parsed as YAML
+
* `native.yaml` does not contain a text field `shell`
+
* writing first response to stdout fails
+
* there is any problem executing the contents of the `shell` field
+
  using `bash`
+
* executing the shell snippet takes too long
+
* generating or writing a "run metadata" file fails
+
* writing second response to stdout fails
+
* finding or parsing all run metadata files fails
+
* generating or writing the static web pages listing all runs fails
+

+

+
# Requirements
+

+
Overall, the native CI engine, or adapter, is very simple. However, it
+
must be robust, which makes things more difficult. Here, robust means
+
that whatever happens, the node owner finds out what it was. If a run
+
fails for whatever reason, the node owner can figure out why. Ideally,
+
this applies to anyone watching CI on the node can see it as well.
+

+
In the descriptions of the requirements we use the following roles:
+

+
* "developer" makes changes to the repository on which CI is run
+
* "node owner" runs the node itself
+

+
The native CI engine has several ways to report what it does:
+

+
* its standard error output
+
  - in a systemd setup this is captured to the system log or journal
+
* a per-node log file for native CI
+
  - this is for this that interest only the node owner, not the
+
    developer
+
  - e.g., finding configuration errors that the developer can't fix,
+
    such a missing configuration file
+
* a per-run log file
+
  - this of interest to both the node owner and the developer
+
  - this is the primary tool for the developer to figure out what went
+
    wrong in their CI run, so that they can change their repository to
+
    fix it
+

+
## Developer can see what status of each CI run on a node
+

+
_Requirement:_ The developer can see what CI runs a node has
+
triggered, and what the current status of each is.
+

+
_Justification:_ This lets them be reassured that CI is working.
+

+
_Implementation:_ Native CI maintains one or more web pages that list
+
every run. For each run, the following is recorded:
+

+
## Developer gets a useful run log
+

+
_Requirement:_ The developer can fetch a useful log of a run that
+
helps them find out problems in their code.
+

+
_Justification:_ This is crucial for the developer to have any hope of
+
fixing a problem found in CI.
+

+
_Implementation:_ The run log is a static file that can be fetched via
+
HTTP from the node, or viewed in a web browser. The run log contains
+
at least the following information:
+

+
* the repository ID
+
* the repository alias, if one is known to the local node
+
* the commit id that triggered the run
+
* the commit diff (`git show`)
+
* when the run was triggered
+
* when the run finished
+
* the environment variables of the native CI process
+
* every command or other action that was taken during the run
+
* the standard output and standard error output, and the exit code, of
+
  every command
+
* whether the run was considered successful or not
+

+
## Node owner is informed via system log if CI fails early
+

+
_Requirement:_ If a native CI run fails early, it writes a message to
+
its standard error output.
+

+
_Justification:_ The standard error is captured by systemd, and
+
written to the system log or journal, from where the node owner can be
+
expected to find it. This gives them a chance to find out what's wrong
+
and hopefully fix it.
+

+
"Early" here means any time before the broker has been given a
+
"result" response message, and a per-run log file has been created,
+
and the web page of all CI runs has been updated.
+

+
_Implementation:_ Use a suitable Rust logging library, with the
+
default log level allowing only error messages, and only logging an
+
error if something goes wrong early.
+

+
## Only early failures are logged to the system log
+

+
_Requirement:_ The native CI engine only writes to its standard error
+
output when it fails early. Otherwise it only updates its per-run and
+
per-node log files.
+

+
_Justification:_ It's easy to spam the system log with many useless
+
messages, which make it harder to find important information in the
+
log.
+

+
## The per-node log is updated when an early error occurs
+

+
_Requirement:_ If native CI writes an error message to the standard
+
error output, it is also written to the per-node log, with more
+
detail.
+

+
_Justification:_ The system log is a bad place to report detailed
+
information, as it's quite constrained. A per-node log provides more
+
flexibility.
+

+
_Implementation:_ Append to a per-node log, and if that fails, report
+
that, too, to the standard error output.
+

+

+
# Test architecture
+

+
In order to test the native CI engine, we invoke it in various ways,
+
and examine its outputs.
+

+
![Test setup](test.svg)
+

+
In order for the native CI engine to work, it needs to clone from a
+
node. This is awkward for testing. Using a real node is possible, but
+
introduces more moving parts that can fail during tests. Using a test
+
double, or mock, as the node would be possible, but more work, and
+
it'd be somewhat tricky logic, which is likely to introduce bugs.
+

+
We implement the test suite to use a specially set up local node, with
+
a repository with contents for tests. We will create the node as part
+
of the test suite so that it has exactly the content we need for the
+
tests.
added doc/architecture.uml
@@ -0,0 +1,20 @@
+
@startuml
+

+
participant "CI broker" as broker
+
participant "Native CI" as ci
+
participant "Radicle \n node" as node
+
participant "Repository \n (local clone)" as repo
+
participant "/bin/bash" as shell
+
participant "index.html" as index
+

+
broker -> ci : request message
+
ci -> node   : git clone
+
node -> repo
+
ci -> repo   : read native.yaml
+
repo -> ci
+
ci -> broker : response: triggered
+
ci -> shell  : execute desired commands
+
shell -> ci  : stdout, stderr, exit
+
ci -> broker : response: result
+
ci -> index  : generate run index web page
+
@enduml
added doc/test.uml
@@ -0,0 +1,16 @@
+
@startuml
+

+
participant "Test harness" as harness
+
participant "Native CI" as ci
+
participant "Local node" as node
+
participant "/bin/bash" as shell
+

+
harness -> ci : invoke
+
harness -> ci : request via stdin
+
ci -> node    : git clone
+
node <- ci
+
ci -> shell   : run build
+
ci <- shell   : stdout, stderr, exit
+
harness <- ci : response via stdout
+

+
@enduml