Radish alpha
r
rad:z3trNYnLWS11cJWC6BbxDs5niGo82
Radicle Improvement Proposals (RIPs)
Radicle
Git
Add RIP #2 draft
Alexis Sellier committed 3 years ago
commit 1c402116983be19e754fb14aa7ce38145f0a4b09
parent a2107ee
1 file changed +441 -0
added 0002-identity.md
@@ -0,0 +1,441 @@
+
---
+
RIP: 2
+
Title: Identity
+
Author: '@cloudhead <cloudhead@radicle.xyz>'
+
Status: Draft
+
Created: 2022-12-06
+
License: CC0-1.0
+
---
+

+
RIP #2: Identity
+
================
+
In RIP #1, we discussed *repository identity*, and the *identity document*.
+
We said that to make it possible for repositories to be hosted in a peer-to-peer
+
network, Git repositories on their own are not enough: we need a secure way to
+
identify repositories that goes beyond source code. We need a stable identifier
+
and a mechanism for self-certifying repositories against this identifier,
+
so that changes to source code can be verified locally, by users.
+

+
In this RIP, we discuss the method through which we can achieve the above in
+
a secure, decentralized way.
+

+
Table of Contents
+
-----------------
+
* [Overview](#overview)
+
* [Peer Identity](#peer-identity)
+
* [Repository Identity](#repository-identity)
+
    * [Validation](#validation)
+
* [The Repository Identifier](#the-repository-identifier)
+
* [Identity Storage](#identity-storage)
+
    * [Verification](#verification)
+
* [Security](#security)
+
* [Closing Thoughts](#closing-thoughts)
+
* [Credits](#credits)
+
* [Copyright](#copyright)
+

+
Overview
+
--------
+
To introduce the topic of identity, we point the reader to the opening
+
paragraphs of the original specification of identities on the Radicle network,
+
which is still very much applicable to the Heartwood protocol:
+

+
> In order to collaborate on repositories within a consensus-free network, we
+
> must be able to refer to them using a stable identifier. Note that this
+
> identifier is a statement of intent: a repository can be described as a
+
> collection of ever-moving leaves of a DAG whose root element is the empty
+
> object. Therefore, the content of a repository is not enough to describe it –
+
> while two views on the repository may share objects, they may diverge
+
> substantially otherwise. Both views may however state their intent to
+
> eventually converge to the same state.
+
>
+
> While in principle a random identifier with sufficient entropy would suffice
+
> for the purpose, this would put the burden of deciding which repository views
+
> are legitimate entirely on the user. Instead, our approach is to establish an
+
> ownership proof, tied to the network identity of a peer, or set of peers,
+
> such that repository views can be replicated according to the trust
+
> relationships between peers (“tracking”).
+
>
+
> Our model is loosely based on The Update Framework (TUF)[^0], conceived as a
+
> means of securely distributing software packages.
+

+
With this in mind, there are three core components to the Radicle identity
+
system, for any given repository:
+

+
1. A set of peers on the network, each holding a signing key.
+
2. A document which establishes the identity of this repository, using these
+
   signing keys to self-certify.
+
3. A stable identifier that can be used to refer to the repository, derived
+
   from this document.
+

+
Peer Identity
+
-------------
+
Since Radicle repositories on the network are created by peers, we must first
+
establish the concept of a *peer identity*. In Heartwood, peers are simply
+
identified by their public key. This key is an Ed25519[^1] key that is encoded
+
as a DID using the `did:key` method[^2]. DIDs are used for interoperability
+
with other systems as well as allowing for other types of identifiers in the
+
future.
+

+
    did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
+

+
*Example of a peer identifier in DID format.*
+

+
We'll also note that peers on the network -- also called *nodes* are
+
indistinguishable from *users* at the protocol level. The terms "Node ID",
+
"Peer ID", "Public Key" are thus all used interchangeably.
+

+
Repository Identity
+
-------------------
+
With the establishment of peer identities, we can now move on to repository
+
identities. A repository identity consists of an identity document and an
+
associated unique identifier.
+

+
The identity document is a JSON document associated with a repository on Radicle.
+
The *hypothetical* minimal identity document looks like this:
+

+
    { "delegates": ["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
+
      "threshold": 1 }
+

+
It describes a repository with a single *delegate*. Delegates are trusted
+
entities that can cryptographically sign data within the scope of a given
+
repository. In the identity document, they are represented by a DID. As of this
+
RIP, only the `did:key` method is supported.
+

+
Using the `threshold` property, the document specifies that only *one* delegate
+
is required to sign updates to the repository. In this case, since we only
+
have one delegate, this is the only possible value for `threshold`.
+

+
> Repository delegates are responsible for signing all updates to a repository,
+
> whether it be source code commits or updates to the identity document itself.
+
> They can be thought of as repository "maintainers", though the applicability
+
> is broader. We will see how delegates sign repository updates in one of
+
> the following sections.
+

+
Though the above document could constitute a valid identity, it does not contain
+
any identifiable data that may be used to describe a particular repository.
+
This is what the `payload` section is for. Heartwood defines a single payload
+
type, `xyz.radicle.project`, which can be used to describe a project stored
+
in a repository:
+

+
    { "delegates": ["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
+
      "threshold": 1,
+
      "payload": {
+
        "xyz.radicle.project": {
+
          "name": "heartwood",
+
          "description": "Radicle Heartwood Protocol & Stack",
+
          "defaultBranch": "master"
+
        }
+
      }
+
    }
+

+
The string `xyz.radicle.project` is called a *payload ID*, and the `project`
+
payload is the default payload type for Radicle repositories. Using this payload,
+
type, a repository may be given a name, a description, and a default branch.
+

+
Identity documents are designed to be extensible, and developers may create
+
their own payload types and applications can choose which payload types to
+
support.
+

+
    { ...
+
      "payload": {
+
        "xyz.radicle.project": { ... },
+
        "xyz.radicle.funding": { ... },
+
        "com.atproto.account": {
+
          "email": "eve@atproto.com",
+
          "handle": "eve"
+
        }
+
      }
+
    }
+

+
<small>Figure 1. Fictional example of an identity with multiple payloads</small>
+

+
Payload IDs use reverse domain-name notation[^3] and are comprised of two
+
parts: an *authority*, eg. `radicle.xyz`, and a *name*, eg. `project`. To keep
+
payload types globally unique, developers wishing to create new payload types
+
must control the authority (domain) under which these live.
+

+
As of this RIP, there is only one recognized payload type: `xyz.radicle.project`.
+
Repositories which include this type of payload are sometimes called *projects*.
+

+
> When specifying new payload types, it's worth thinking about how the payload
+
> schema might evolve over time. For example, it might be worth versioning the
+
> payload types, either via the identifier (eg. `com.atproto.account.v1`) or
+
> via a field inside the payload (eg. `{"version": 1}`). This will ensure that
+
> changes to the payload schema are able to be made in a backwards compatible
+
> way.
+

+
### Validation
+

+
An identity document is valid if the following conditions are met:
+

+
* There is at least *one* `delegate`, but no more than `255`.
+
* Strings are not empty, and at most `255` characters long.
+
* The `threshold` is not zero and not greater than the number of `delegates`.
+
* The items in `delegates` are valid DIDs.
+
* There is a `payload` property with at least one payload object and a valid
+
  payload ID.
+
* Each payload under `payload` is valid according to the rules of that payload.
+

+
These rules can be partly described in the following JSON Schema[^4] document:
+

+
    {
+
      "$schema": "https://json-schema.org/draft/2020-12/schema",
+
      "type": "object",
+
      "properties": {
+
        "delegates": {
+
          "type": "array",
+
          "items": [{ "type": "string" }],
+
          "minItems": 1,
+
          "maxItems": 255,
+
          "uniqueItems": true
+
        },
+
        "threshold": {
+
          "type": "integer",
+
          "minimum": 1,
+
          "maximum": 255
+
        },
+
        "payload": {
+
          "type": "object",
+
          "additionalProperties": { "type": "object" },
+
          "minProperties": 1
+
        }
+
      },
+
      "required": ["delegates", "threshold", "payload"]
+
    }
+

+
Finally, the schema and validation rules for the `xyz.radicle.project` payload
+
are described as:
+

+
    {
+
      "$schema": "https://json-schema.org/draft/2020-12/schema",
+
      "type": "object",
+
      "properties": {
+
        "name": {
+
          "type": "string",
+
          "minLength": 1,
+
          "maxLength": 255
+
        },
+
        "description": {
+
          "type": "string",
+
          "maxLength": 255
+
        },
+
        "defaultBranch": {
+
          "type": "string",
+
          "minLength": 1,
+
          "maxLength": 255
+
        }
+
      },
+
      "required": ["name", "description", "defaultBranch"]
+
    }
+

+
The Repository Identifier
+
-------------------------
+
Now that we have a document describing our repository, we can derive from it
+
a unique identifier that can be used to refer to the repository on the
+
peer-to-peer network. This identifier must meet certain criteria:
+

+
1. It must be *stable*, in other words it must not change throughout the
+
   lifetime of the repository.
+
2. It must be deterministically derivable from the identity document alone, for
+
   the purpose of verification.
+
3. It must contain enough entropy to be globally unique.
+
4. It must allow for easy retrieval of the document from storage.
+

+
To fulfill the above, and given that Radicle uses Git for storage of repository
+
data, we choose to use the *Git Object ID*[^5] of the identity document, as
+
identifier. Git object IDs, or *OIDs* are "hardened" SHA-1 checksums of their
+
content, prefixed with a short header. We can compute this OID using the `git
+
hash-object` command. But before doing so, we must take care of one last thing:
+
to make the process of hashing our identity document fully deterministic, we
+
must first ensure our document is in canonical JSON form[^6]. This prevents
+
things like whitespace or key ordering from influencing the document hash and
+
therefore the identifier. In turn, this makes the identifier easier to compute
+
correctly.
+

+
The above document in canonical form looks like this:
+

+
    {"delegates":["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
+
    "payload":{"xyz.radicle.project":{"defaultBranch":"master","description":
+
    "Radicle Heartwood Protocol & Stack","name":"heartwood"}},"threshold":1}
+

+
We can now compute the Git object ID by placing the above JSON in a file, eg.
+
`radicle.json`, taking care to strip all newline characters from it, and
+
running the following command:
+

+
    $ git hash-object -t blob radicle.json
+

+
The output should be:
+

+
    d96f425412c9f8ad5d9a9a05c9831d0728e2338d
+

+
This SHA-1 hash is the document's OID. To turn it into a Radicle repository
+
identifer, we encode the underlying 20-byte hash value using `multibase`
+
encoding[^7] with the `base-58-btc` alphabet; the same encoding used for the
+
`did:key` method, and prefix `rad:` to it, making it a valid URN:
+

+
    "rad" ":" multibase(base58-btc, raw-oid-bytes)
+

+
This results in the repository identifier, or RID:
+

+
    rad:z42hL2jL4XNk6K8oHQaSWfMgCL7ji
+

+
This RID is theoretically unique thanks to the entropy provided by the delegate
+
key and payload.
+

+
Identity Storage
+
----------------
+
A storage system suitable for storing identity documents must have two
+
properties:
+

+
1. It must guarantee data integrity.
+
2. It must preserve the history of changes to the documents.
+

+
Radicle repositories are stored in Git, and criteria (1) is guaranteed by Git
+
natively, so long as we store our identity documents in the Git object database.
+
This is because Git hashes all objects under it, and structures its data such
+
that a change in hash of one object means a change in hash of all dependent
+
objects.
+

+
Criteria (2) can be guaranteed by encoding changes to the documents as a Git
+
commit history. Not only does a commit history allow us to preserve all changes,
+
it also proves, via hash-linking that no change was omitted from the history.
+

+
    Commit             Commit             Commit
+
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
+
    │ fb8e40a     │◄─┐ │ c581f25     │◄─┐ │ 43eb12d     │
+
    │             │  └─┤ fb8e40a     │  └─┤ c581f25     │
+
    │ ┌──────────┐│    │ ┌──────────┐│    │ ┌──────────┐│
+
    │ │ Document ││    │ │ Document ││    │ │ Document ││
+
    │ ├──────────┤│    │ ├──────────┤│    │ ├──────────┤│
+
    └─┴──────────┴┘    └─┴──────────┴┘    └─┴──────────┴┘
+

+
In the above diagram, we see three commits, with the left-most being the *root*
+
commit of the document, ie. the initial state, and the right-most being the
+
*head*, ie. the latest state.
+

+
Each commit includes within its Git tree, a blob named `radicle.json` which
+
contains a version of the identity document.
+

+
    ┌───────┬─────────┐                                   ┌─────┬───────────┐
+
    │commit │ fb8e40a │   ┌─────┬────────────────────┐  ┌►│blob │ d96f425   │
+
    ├───────┴─────────┤ ┌►│tree │ 82bc09a            │  │ ├─────┴───────────┤
+
    │tree   82bc09a   ├─┘ ├─────┴────────────────────┤  │ │{"delegates":...,│
+
    │author ...       │   │blob d96f425 radicle.json ├──┘ │ "payload":...,  │
+
    │                 │   └──────────────────────────┘    │ "threshold":...}│
+
    └─────────────────┘                                   └─────────────────┘
+

+
Using this representation, we can hence track all changes to a given identity.
+
To keep track of the head of this history, we use a special Git reference
+
named `rad/id` which points to the latest version of the identity document:
+

+
    fb8e40a ◄─ c581f25 ◄─ 43eb12d ◄─ refs/rad/id
+

+
Just like regular Git branches, when the identity is updated, `refs/rad/id`
+
is reset to point to the latest commit. Note that this commit history is
+
completely separate from the source code history pointed to by eg.
+
`refs/heads/master` and other branches. The identity history is kept in
+
the repository's *stored copy*, which is a bare repository, and is not included
+
in working copies.
+

+
### Verification
+

+
Verifying the latest state, ie. commit `43eb12d` is a matter of verifying all
+
preceding states, starting from the root (`fb8e40a`).
+

+
The root commit is verifiable intrinsically, since it contains as part of its
+
Git tree, the document from which we derived the RID. We can see above that the
+
original blob from which we computed the repository identifier is contained in
+
the tree pointed to by the root commit of the document history. The root commit
+
is hence valid for a given RID if and only if it contains a blob under the name
+
`radicle.json` containing a valid identity document which hashes to the given
+
RID, *and* the commit is signed by all delegates in the initial `delegates` list.
+

+
Once the root commit is verified, we can proceed to the next commit. Since
+
the document may have changed in this commit, the RID is no longer useful for
+
verifying this commit. Instead, we make sure that two conditions are fulfilled:
+

+
1. The commit containing the updated document is signed by a number of keys
+
   greater than or equal to the `threshold` property of the *previous*, valid
+
   version of the document.
+
2. Each of the aforementioned signatures belongs to a key that is part of the
+
   `delegates` set of the previous document version.
+

+
> Git commit signatures can be verified with the `git verify-commit` tool.
+

+
These delegate signatures are expected to be included in the commit header
+
under the `gpgsig` key, and be encoded in the SSH signature format.
+

+
    tree c66cc435f83ed0fba90ed4500e9b4b96e9bd001b
+
    parent af06ad645133f580a87895353508053c5de60716
+
    author Buck Mulligan <buck@mulligan.xyz> 1664467633 +0200
+
    committer Buck Mulligan <buck@mulligan.xyz> 1664786099 -0200
+
    gpgsig -----BEGIN SSH SIGNATURE-----
+
     U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgvjrQogRxxLjzzWns8+mKJAGzEX
+
     4fm2ALoN7pyvD2ttQAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
+
     AAAAQIQvhIewOgGfnXLgR5Qe1ZEr2vjekYXTdOfNWICi6ZiosgfZnIqV0enCPC4arVqQg+
+
     GPp0HqxaB911OnSAr6bwU=
+
     -----END SSH SIGNATURE-----
+
    gpgsig -----BEGIN SSH SIGNATURE-----
+
     U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgDb3ulFKnHALG8AnuuFPY9prvVZ
+
     kyLc73tcQ+HG3sCzQAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
+
     AAAAQM9rxErTt7AtcLypSyVM/jmd9/syO4D5hjMjL/9lbGzIkXXDL6+QlUsLipeLuYHV92
+
     F/6nm/lEaPUTeiZQ5o9AI=
+
     -----END SSH SIGNATURE-----
+

+
<small>A Git commit header with two SSH signatures.</small>
+

+
We proceed in this manner until the last commit in the history. If all commits
+
pass this verification process, we consider the identity valid. Note that every
+
version of the document must be validated according to the rules stated under
+
the [Validation](#validation) section. This includes the document payload,
+
and implies that application developers supporting payload extensions will
+
have to provide their own validation for these payloads, that will have to run
+
for each commit in the document history.
+

+
It's important to restate that for any commit `C`, other than the root commit,
+
verification is done by using the `delegates` and `threshold` values of the
+
*parent* commit to `C`, which has already been verified.
+

+
Security
+
--------
+
The combination of Git storage and cryptographic verification provides very
+
strong security and integrity guarantees around Radicle repositories and
+
identities:
+

+
* Omitted data up to the latest commit is detected by Git itself
+
* Tampering with the identity root will result in a different RID
+
* Adding a delegate key without the sign-off of the existing delegate set will
+
  fail verification
+

+
There is one possible attack that can be carried out by network participants:
+
serving old data. Since it isn't possible to know whether a document history
+
has a more recent update than the latest known update, a dishonest peer may
+
choose to hide the last *N* identity updates from its peers. This means it
+
will serve a stale document to its peers.
+

+
However, this attack is only effective if *all* of a victim's connected peers
+
perform this censorship. It takes only one honest peer to serve the full
+
document history for the censorship to fail.
+

+
Closing Thoughts
+
----------------
+
In this RIP we described an identity system for Git repositories that can be
+
used to securely distribute code on a peer-to-peer network. The system is
+
self-certifying and requires only basic Git primitives to implement.
+

+
Credits
+
-------
+
* Kim Altintop, for the original design this system is based on
+

+
Copyright
+
---------
+
This document is licensed under the Creative Commons CC0 1.0 Universal license.
+

+
[^0]: https://theupdateframework.github.io/specification/latest/
+
[^1]: https://ed25519.cr.yp.to/
+
[^2]: https://w3c-ccg.github.io/did-method-key/
+
[^3]: https://en.wikipedia.org/wiki/Reverse_domain_name_notation
+
[^4]: https://json-schema.org/
+
[^5]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
+
[^6]: https://datatracker.ietf.org/doc/html/rfc8785
+
[^7]: https://w3c-ccg.github.io/multibase/