Radish alpha
r
rad:z371PVmDHdjJucejRoRYJcDEvD5pp
Radicle website including documentation and guides
Radicle
Git
RIPs on the Website
Draft lorenz opened 8 months ago

This is an attempt to make the RIPs available on radicle.xyz

Some thoughts:

  • Importing the RIPs should be done with git submodule
  • None or minimal manual modifications should be necessary to update.
  • RIPs might be modified slightly to support this.
  • Standardize how RIPs refer to each other.
7 files changed +1645 -0 4da3bf57 394cd343
modified README.md
@@ -19,6 +19,20 @@ This is the Radicle homepage and documentation repository.

[ruby]: https://www.ruby-lang.org/en/documentation/installation/

+

+
## RIPs
+

+
They live in `_rips` as a squashed subtree, using `git subtree`.
+

+
If you plan to work on the integration with RIPs, it is *very*
+
helpful to add the repository as a remote:
+

+
	git remote add rips rad://z3trNYnLWS11cJWC6BbxDs5niGo82
+

+
Then, to update the subtree:
+

+
	git subtree --prefix _rips pull rips master
+

## License

Licensed under [CC BY-NC-SA 4.0][license]. See [LICENSE](LICENSE).
modified _config.yml
@@ -11,6 +11,9 @@ collections:
  pages:
    output: true
    permalink: /:path/
+
  rips:
+
    output: true
+
    permalink: /rip/:path/


highlighter: rouge
@@ -51,3 +54,7 @@ defaults:
      path: _posts
    values:
      layout: post
+
  - scope:
+
      path: _rips
+
    values:
+
      layout: rip
added _layouts/rip.html
@@ -0,0 +1,30 @@
+
<!DOCTYPE html>
+
<html lang="{{ site.lang | default: "en-US" }}">
+
  <head>
+
    {% include meta.html %}
+
    <link rel="stylesheet" type="text/css" href="{{ "/assets/css/guide.css" | relative_url }}"/>
+
  </head>
+
  <body>
+
    <header>
+
      <div class="title">
+
        <a href="/guides">
+
          <img src="{{ "/assets/images/radicle.svg" | relative_url }}" alt="" />
+
          <span>Radicle Improvement Proposals</span>
+
        </a>
+
      </div>
+
      <nav>
+
      </nav>
+
      <button class="toggle" id="toggle-theme"><!-- Set by script --></button>
+
    </header>
+
    <main>
+
	    <h1>RIP {{ page.RIP}}: {{ page.Title }}</h1>
+
      {% include toc.html html=content class="toc" item_class="toc-h%level%" %}
+
      <hr />
+
      {{ content }}
+
    </main>
+
    <footer>
+
      <p>&copy; The Radicle Team</p>
+
    </footer>
+
    <script src="{{ "/assets/js/toggle-theme.js" | relative_url }}"></script>
+
  </body>
+
</html>
added _rips/0001-heartwood.md
@@ -0,0 +1,620 @@
+
---
+
RIP: 1
+
Title: Heartwood
+
Author: '@cloudhead <cloudhead@radicle.xyz>'
+
Status: Draft
+
Created: 2022-09-06
+
License: CC0-1.0
+
---
+

+
In this RIP, we define a major iteration of the Radicle network protocol and
+
the various related sub-systems. We call it "Heartwood".
+

+
The intent of this proposal is not to define a complete specification of the
+
protocol, but to be a foundation for subsequent RIPs. Various aspects
+
of the protocol, in particular around the issues of privacy, censorship
+
resistance, peer misbehavior and DoS are left for future RIPs to expand on,
+
and won't be tackled here. Additionally, details and specifics on the wire
+
protocol, message formats, hash functions and encodings may be left out,
+
to focus this proposal on the big ideas.
+

+
Overview
+
--------
+
The Radicle network protocol can be defined through the intended use-case for a
+
peer-to-peer code hosting network:
+

+
*Alice publishes a repository on the network under a unique identifier, and
+
Bob, using that identifier is able to retrieve it from the network whether
+
Alice is online or offline, and verify the authenticity of the data.*
+

+
The above must hold true independent of the network topology and number of
+
nodes hosting the project, as long as at least one node is hosting it. We can
+
therefore say that the primary function of the protocol is to locate repositories
+
on the network, and serve them to users, all in a timely, and resource-efficient
+
manner.
+

+
This functionality can be broken down into three components:
+

+
1. Locating repositories, ie. finding which nodes host a given repository
+
2. Replicating a given repository between two nodes, such that they both
+
   hold a copy
+
3. Verifying the authenticity of all the data retrieved from the network, such
+
   that any node can serve any data without the need for trust
+

+
To achieve (1), nodes need to exchange information about which repositories
+
they host, so that they can point users to the right locations, as well as
+
notify each other when there are updates to the repositories. This in turn
+
requires *peer* discovery: nodes need a way to find each other on the network.
+

+
To achieve (2), git is used for its excellence as a replication protocol.
+

+
To achieve (3), given the nature of peer-to-peer networks, ie. that any node
+
can join the network, git alone is not enough. Replication through git needs to
+
be paired with a way of verifying the authenticity of the data being
+
replicated. While git checks for data *integrity*, our protocol will have to
+
make sure that the data Bob downloads is the data Alice published on the
+
network. Without such verification, an intermediary node on the network could
+
easily tamper with the data before serving it to Bob. We also can't require
+
that Alice serve her data directly to Bob, as that would require them to be
+
online at the same time, and would introduce a single point of failure.
+

+
Table of Contents
+
-----------------
+
* [Repository Identity](#repository-identity)
+
    * [The Identity Document](#the-identity-document)
+
* [Repository Discovery](#repository-discovery)
+
    * [Topology](#topology)
+
    * [Routing](#routing)
+
* [Node Identity](#node-identity)
+
* [Gossip](#gossip)
+
    * [Inventory Announcements](#inventory-announcements)
+
        * [Pruning](#pruning)
+
    * [Reference Announcements](#reference-announcements)
+
    * [Node Announcements](#node-announcements)
+
        * [Bootstrap Nodes](#bootstrap-nodes)
+
* [Replication](#replication)
+
    * [Project Tracking and Branches](#project-tracking-and-branches)
+
    * [Unintentional Forks and Conflicts](#unintentional-forks-and-conflicts)
+
* [Storage](#storage)
+
    * [Layout](#layout)
+
        * [Special References](#special-references)
+
* [Canonicity](#canonicity)
+
* [Closing Thoughts](#closing-thoughts)
+
* [Credits](#credits)
+
* [Copyright](#copyright)
+

+
Repository Identity
+
-------------------
+
To locate, or even "talk" about repositories on a peer-to-peer network, we
+
require a stable, unique identifier that can be verifiably associated with a
+
repository. Without this, there is no way for a user to request a specific
+
repository and verify its authenticity. Unlike centralized forges such
+
as GitHub, where repositories are deemed authentic based on their *location*,
+
eg. `https://github.com/bitcoin/bitcoin`; in an *untrusted* network, location
+
is not enough and we need a way to automatically verify the data we get from any
+
given location. Therefore, before we talk about networking, we must make a
+
little detour into repository identity.
+

+
It's important to understand that although git repositories use content
+
addressing for their objects, repositories are *mutable* data-structures.
+
Therefore, the identity of a repository *cannot* be derived solely from its
+
contents. Instead, the identity must be determined by some other authority.
+
In Radicle, this is no other than the *maintainers* of the repository, since it
+
is their mandate to decide what gets merged into a codebase.
+

+
We can then define a repository's identity as the set of all branches and tags
+
that the maintainers of the repository agreed upon at a given point in time,
+
along with a unique identifier.
+

+
For anyone to be able to verify an identity, we require maintainers to provide a
+
cryptographic signature over the repository's heads, tags, and other relevant
+
git references, along with repository metadata such as name and description.
+
We call this the *signed refs*. Signed refs can be updated whenever there are
+
changes to a repository that are accepted by maintainers. They represent a
+
repository's *canonical state*.
+

+
As for the identifier, it must be provably associated with the above state.
+
In other words, it must be possible, given an identifier and a set of signed
+
refs, to prove association.
+

+
### The Identity Document
+

+
Before a repository can be published to the network, it needs to be initialized
+
into a Radicle *project*. A project is simply a repository with an associated
+
*identity document*. In this document, the public keys of the repository's
+
current maintainers are stored. When a project is initialized from an existing
+
git repository for the first time, the user initializing becomes the de-facto
+
initial maintainer of the project, and his key is included in the new identity
+
document's key ring. We call this set of trusted keys the project's *delegation*,
+
and each key is called a *delegate*. Though these will often map one to one with
+
maintainers, this is not a requirement. The only requirement is that they be
+
trusted to represent a given project, as they will be used to determine
+
the canonical state of the project repository.
+

+
From this initial identity document we can then derive a unique, stable
+
identifier for the project, by hashing the document's contents. In addition to
+
the *delegation*, we include in the document a user-chosen *alias* for the
+
project, as well as a *description*. The process for hashing the document shall
+
be discussed in a subsequent RIP.
+

+
It is by including the identity document in the *signed refs* that we establish
+
a relationship between the source code and the identity, and thus associate
+
the project identifier with the project source code. Note that this permits
+
identical source codes to have more than one identity. This is useful when
+
a user wishes to a *fork* a repository. In that case, a new project would
+
be initialized with a brand new identifier, but a mostly identical source code
+
history.
+

+
The storage, update and verification mechanism for the identity document
+
will be discussed in more detail in a future RIP. For the purposes of this
+
document, we can assume a verification mechanism that takes as input the project
+
identifier, signed refs, and identity document, and outputs whether the project
+
is valid or not.
+

+
At the networking level, all we need is a way to derive stable identifiers for
+
repositories, and a verification process that asserts that a given repository
+
corresponds to some project identifier.
+

+
Repository Discovery
+
--------------------
+
The Radicle network protocol has to serve one core purpose: given a project
+
identifier, a user should be able to retrieve the full project source code
+
associated with that identifier, and verify its authenticity. This function
+
should be independent of where the project is located, and how many replicas
+
exists, provided at least one replica exists.
+

+
### Topology
+

+
Given that there is no natural incentive for nodes to host *arbitrary* projects,
+
nodes on the network should be given the choice of which projects to host.
+

+
For example,
+

+
* A company that uses or provides open source software may want to host it on
+
  their node, to ensure its continued availability on the network.
+
* A business that hosts projects for a fee would need to be able to choose
+
  which projects it hosts.
+
* A developer contributing to a project may want to self-host it on his node.
+

+
For this reason, we cannot deterministically compute on which node(s)
+
a given project should be hosted, as is the case with DHTs. Nodes are able
+
to choose what they host, and therefore the network is fundamentally
+
*unstructured*. Some nodes may host thousands of projects, while others may
+
host only one or two. Though there is a benefit to arranging the peer
+
topology in a certain fashion (eg. to reduce communication overhead), this
+
cannot be relied on in an untrusted network, and therefore we don't make
+
these assumptions in the basic protocol either.
+

+
### Routing
+

+
The general problem of reaching a specific node on the network is usually
+
known as "routing". Where IP routing tries to route traffic to a certain
+
IP address, in the Radicle network, we attempt to route requests to
+
one or more nodes that host a given project; these are called *seeds*
+
in the context of that project. A seed is a node that hosts and serves
+
one or more projects on the network.
+

+
Routing information is usually stored in a *routing table* that is keyed
+
by the "target", in our case this is the project identifier:
+

+
    RoutingTable = BTreeMap<RepositoryId, Vec<NodeId>>
+

+
For each project, we keep track of the nodes that are known to host this
+
project. Using hashes for project identifiers and IP addresses as node
+
identifiers, the table might look something like:
+

+
    80a2970…        54.122.99.1, 89.2.23.67
+
    c5e079e…        66.12.193.8, 89.2.23.67, 12.43.212.9
+
+

+
To build the table, nodes gossip information about other nodes, namely *what*
+
projects are hosted *where*.
+

+
Assuming `32 byte` project and node identifiers, an average of `3` nodes
+
hosting each project, and a million projects, all stored in a binary tree,
+
we would need only about `244 MB` to store the entire routing table in memory
+
with no compression:
+

+
    project count = 1'000'000
+
    leaf size = 32 + 32 * 3 = 128 B
+
    leaves size = leaf size * project count = 128'000'000 B = ~122 MB
+
    index size = leaf size * (project count - 1) = ~122 MB
+
    total = 122 MB + 122 MB = ~244 MB
+

+
Since this amount of memory is available on commodity hardware, we see
+
no need to partition the routing table for the time being, and propose
+
that each node store the entire routing table on disk or in memory.
+

+
Node Identity
+
-------------
+
The identity of a node on the network is simply the identity of the user
+
operating the node. To be able to securely verify data authorship, we use
+
public key cryptography, with the public key being used as the node identifier.
+
In the case of nodes run by end-users; which is likely most nodes; the
+
node's secret key is used to create the *signed refs* and optionally to
+
sign git commits.
+

+
The use of the same identity for both network communications and code signing
+
makes the network more transparent, while allowing nodes to trust each other
+
based on the code they publish.
+

+
For nodes that are run as always-on "servers", the node identity may not be
+
used for signing code. These *seed* nodes only use their secret keys to
+
sign gossip messages and establish secure connections.
+

+
Gossip
+
------
+
We design the Radicle networking layer as a *gossip* protocol. In this proposal,
+
we go over some of the fundamental types of messages that are sent between
+
peers over the network. We contend that the core functionality can be achieved
+
with three message types: *inventory* announcements, *reference* announcements
+
and *node* announcements. Each fulfilling a distinct role. The exact wire
+
protocol will be described in a future proposal; this section should serve
+
as a short introduction to the topic.
+

+
### Inventory Announcements
+

+
To build their routing table, nodes connecting to the network announce to their
+
peers what inventory they have, ie. what projects they are seeding.
+
These announcements are relayed to other connected peers, and so
+
on until they reach the majority of nodes on the network. Messages that have
+
already been seen are dropped, to prevent messages from propagating forever.
+
Gossip messages may be retained by nodes for a certain amount of time, so that
+
they can be served to new nodes joining the network. The *inventory
+
announcement* message has the following shape:
+

+
    InventoryAnnouncement = (
+
        NodeId,
+
        Vec<RepositoryId>,
+
        Timestamp,
+
        Signature,
+
    )
+

+
It contains the identifier of the node making the announcement, the inventory
+
of projects, a timestamp, and the signature of the node over the projects and
+
timestamp. By using a public key as the `NodeId`, we can then both identify
+
and verify the provenance of the message, using the signature.
+

+
In this manner, every node in the network will eventually converge
+
towards a single routing table, provided the network is well connected.
+

+
For larger networks, where nodes cannot be fully meshed, it's desirable for
+
seeds that have projects in common to be connected to each other. Hence,
+
nodes should prioritize connecting to peers that seed the same projects as
+
them. This is simply because relevant messages can reach interested nodes
+
more quickly and efficiently if nodes with shared interests are directly
+
connected to each other; but also because nodes can use already-established
+
connections to fetch data of interest.
+

+
As is often the case with large, unstructured networks, gossip messages can be
+
received out of order. For this reason, the inventory message includes a
+
*timestamp*, which is used for ordering messages. Since the inventory message
+
is meant to communicate a node's complete inventory, nodes can simply ignore
+
inventory messages with timestamps lower than the latest received, and not
+
relay them. To mitigate issues with timestamps far in the future, we reject
+
messages with timestamps too far in the future.
+

+
#### Pruning
+

+
One worry with routing tables on permissionless networks is that nodes come and
+
go all the time. While a project may be available on a node one day, the node
+
may go offline the next day and never come back online. Additionally, nodes
+
may choose to stop hosting a certain project, making it no longer available.
+

+
Hence, the routing table needs to be constantly pruned, with out-of-date
+
entries evicted. To achieve this, we set an expiry on routing table entries,
+
and require live nodes to "refresh" their entries on other nodes by sending
+
`inventory` messages periodically.
+

+
Entries that have been in the table for more than a day without updates or
+
refreshes can then be automatically evicted.
+

+
### Reference Announcements
+

+
When an update to a project is made by a user, the user's node sends a message
+
to the network, announcing the update. Nodes that are tracking this project are
+
then able to fetch the updates via the *git* protocol, either directly from the
+
user's node, or from an intermediary node. This *refs announcement* message
+
looks like this:
+

+
    RefsAnnouncement = (
+
        NodeId,
+
        RepositoryId,
+
        Map<RefName, CommitId>,
+
        Signature,
+
    )
+

+
It contains the identifier of the node announcing the updated references, the
+
repository under which these refs reside, the map of reference names (`RefName`)
+
with their new commit hashes (`CommitId`) and a signature from the publishing
+
node (`NodeId`), over the refs and project identifier.
+

+
This allows any receiving node tracking the project to verify the legitimacy
+
of the message using `NodeId` and `Signature`. For new projects, published
+
on the network for the first time, the same type of message can be used.
+

+
Reference announcements, unlike inventory announcement should only be
+
relayed to interested nodes, ie. nodes that are hosting the given project,
+
as they will usually be followed by a `git-fetch`.
+

+
We should also note that the `NodeId` in this case is not only the announcer
+
of these updated references, but may be the *author*. When Alice pushes changes
+
to a project, she announces these changes over the network using a reference
+
announcement, via her own node.
+

+
### Node Announcements
+

+
We've touched upon inventory gossip, and how project metadata and data is
+
exchanged, but not how node metadata is exchanged; or in other words, how *peer
+
discovery* is carried out. For this purpose, we devise a *node announcement*
+
message:
+

+
    NodeAnnouncement = (
+
        NodeId,
+
        NodeFeatures,
+
        Vec<Address>,
+
        Timestamp,
+
        Signature,
+
    )
+

+
This message is designed to be authored by a node announcing *itself* on the
+
network, and therefore contains a signature and timestamp, and is meant to be
+
relayed by other nodes on the network. The key "payload" of this message is
+
the vector of addresses sent by the node. This should contain all addresses
+
on which the node is publicly reachable. At minimum, this should contain
+
one IP address, but in the future could contain `.onion` addresses or DNS
+
names.
+

+
As with the inventory message, nodes should buffer these announcements to serve
+
them to new nodes connecting to the network. In addition to the list of
+
addresses, we propose to also include a list of features supported by the
+
announcing node, to allow for future protocol upgrades.
+

+
#### Bootstrap Nodes
+

+
A node joining the network for the first time will not know of any peers.
+
Hence, it's advised that network client software be pre-configured with
+
DNS "seeds". These are registered DNS names, eg. `seeds.radicle.xyz` that
+
resolve to node addresses on the network. In the bootstrapping process,
+
nodes can resolve these names to have a set of addresses to initially
+
connect to, and once they find a peer, use the regular peer discovery
+
process to find more nodes.
+

+
Replication
+
-----------
+
While gossip is used to exchange *metadata*, the actual repository *data*, ie.
+
Git objects are transferred via the process of *replication*.
+

+
Nodes are configured with a list of projects that they are meant to host. These
+
are called *tracked* projects, and this configuration is called the *tracking
+
policy*.
+

+
When a new node joins the network, the first thing it will attempt to do is to
+
retrieve these tracked projects from the network. This is called *bootstrapping*.
+
To do this, the node consults its routing table, locates the project's *seeds*,
+
and initiates a `git-fetch` via the *git* protocol, with one or more seed. This
+
fetch operation downloads the relevant git objects into the node's *storage*,
+
making them available to other interested nodes.
+

+
To notify its peers that its inventory has changed, it sends an *inventory*
+
message to each of its peers. Replication is only possible because of the
+
exchange of information on the gossip layer. Without it, nodes wouldn't
+
know where to replicate projects from, and would quickly fall behind.
+

+
### Project Tracking and Branches
+

+
While it's possible to always replicate and track at the *repository* level,
+
it is highly impractical: such an *open* tracking policy is easily abused by
+
malicious nodes. Given limited disk space and bandwidth, nodes need a way to
+
replicate only a *subset* of repository data, authored by users they can
+
trust.
+

+
If we want to allow contributors to publish code that is intended to be merged
+
into another project, we need to think about a node's tracking policy with
+
respect to individual contributors and their git reference *trees*
+
(the set of all branches published under a project by a given contributor).
+
The safest tracking policy is to only track trees published by the delegates of
+
a tracked project. But this means that contributors (non-delegates) won't have
+
their changes replicated on the network. This becomes a problem when a
+
contributor wants to propose a new patch to a project: unless the contributor
+
is online to *seed* that branch, there is no way for the maintainer to retrieve
+
it.
+

+
Lacking a good answer to this problem at the protocol level, we defer to node
+
operators to implement policies at the individual seed level. For example, a
+
seed node may require of users to link their social profiles with their Radicle
+
key before enabling tracking for their branches.
+

+
In the past, we've explored the idea of a "social" tracking graph. We leave
+
this door open for potential future iterations of the protocol.
+

+
### Unintentional Forks and Conflicts
+

+
It is possible, through a software bug or user error to unintentionally fork
+
one's history. For example, a user publishes changes to a branch that land
+
on a node in the network, but later re-writes that branch's history and
+
re-publishes it. Nodes that received the initial changes will not be able
+
to merge these new changes via a simple history *fast-forward*, while nodes
+
that never saw the initial changes will have no problem fetching the new ones.
+

+
    User       Seed #1       Seed #2
+

+
    B                          B
+
    |  A            A          |
+
    | /            /           |
+
    |/            /            |
+
    |            |             |
+

+
In the above diagram, a user publishes history `A` under branch `master`,
+
which lands on `Seed #1` and then publishes history `B` under the same branch.
+
`Seed #2`, which didn't have the prior history is able to fetch that branch
+
without problems. However, `Seed #1` isn't, since the histories are divergent.
+

+
Now, depending on which seed is used to fetch, a user would get a completely
+
different history for that branch.
+

+
To solve this problem, we have to realize one simple thing: every time a user
+
publishes new code to the network, a commit in the *signed refs* history is
+
created. This means that after publishing `B`, the user's signed refs history
+
will look like this:
+

+
    refs/…/heads/master    B
+
    refs/…/heads/master    A
+
+

+
Since `Seed #1` will have this history if it fetches `B`, it will be able to
+
simply set the head of the branch to the latest signed ref for that branch,
+
which is `B`.
+

+
Forks in the signed refs history itself, though much more problematic, could
+
be recovered by picking the history with the latest timestamp.
+

+
Storage
+
-------
+
Since storage and replication are tightly coupled, and replication makes use of
+
Git, so does storage. Unlike previous versions of Radicle, each project is
+
stored in its own *bare* Git repository, under a common base directory.
+

+
Storage is accessed directly by the node to report inventory to other nodes,
+
and accessed by the end user through either specialized tooling or `git`.
+

+
It's important to know that a user working on a project will typically have
+
*two* copies of the repository: one in storage, called the *stored* copy,
+
and one *working copy*. The working copy will be setup in such a way that it is
+
linked to storage via a *remote* named `rad`. Publishing code is then a matter
+
of running for eg. `git push rad`. Code in storage can be considered *public*,
+
since it is shared with connected peers, while code in the working copy is
+
considered *private* until pushed.
+

+
This allows for code to be staged for publishing even while offline, provided
+
the user's storage is accessible. In most cases, it makes sense to keep
+
storage and working copies on the same machine.
+

+
    ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┐          ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┐
+
    ┆ ┌───────────────────┐ ┌──────┐ ┆          ┆ ┌──────┐ ┌─────────────────┐ ┆
+
    ┆ │ Storage           │ │      │ ┆   Git    ┆ │      │ │ Storage         │ ┆
+
    ┆ │                   ├╸┆╸╸╸╸╸╸┆╸╸╸╸╸╸╸╸╸╸╸╸╸╸┆╸╸╸╸╸╸┆╸┤                 │ ┆
+
    ┆ │ ┌──────┐ ┌─────┐ ┌│ │      │ ┆ protocol ┆ │      │ │ ┌─────┐ ┌─────┐ │ ┆
+
    ┆ │ │repo  │ │repo │ ││ │      │ ┆          ┆ │      │ │ │repo │ │repo │ │ ┆
+
    ┆ │ ├──────┤ ├─────┤ ├│ │      │ ┆          ┆ │      │ │ ├─────┤ ├─────┤ │ ┆
+
    ┆ └─┴───╿──┴─┴───┬─┴─┴┘ │      │ ┆          ┆ │      │ └─┴───┬─┴─┴───╿─┴─┘ ┆
+
    ┆       │        │      │      │ ┆  gossip  ┆ │      │       │       │     ┆
+
    ┆       │        │      │ Node ├╸╸╸╸╸╸╸╸╸╸╸╸╸╸┤ Node │       │       │     ┆
+
    ┆       │        │      │      │ ┆ protocol ┆ │      │       │       │     ┆
+
    ┆      push     pull    │      │ ┆          ┆ │      │      pull    push   ┆
+
    ┆       │        │      │      │ ┆          ┆ │      │       │       │     ┆
+
    ┆       │        │      │      │ ┆          ┆ │      │       │       │     ┆
+
    ┆       │        │      │      │ ┆          ┆ │      │       │       │     ┆
+
    ┆  ┌────┴───┐ ┌──╽─────┐│      │ ┆          ┆ │      │ ┌─────╽──┐ ┌──┴────┐┆
+
    ┆  │working │ │working ││      │ ┆          ┆ │      │ │working │ │working│┆
+
    ┆  │copy    │ │copy    ││      │ ┆          ┆ │      │ │copy    │ │copy   │┆
+
    ┆  └────────┘ └────────┘└──────┘ ┆          ┆ └──────┘ └────────┘ └───────┘┆
+
    └╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┘          └╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┘
+

+
Splitting the storage into per-project repositories has numerous advantages
+
over previous designs that used a "monorepo":
+

+
* Private repositories become possible to offer
+
* Concurrency is simpler, since we can have locks at the project level
+
* The Git object database is used more efficiently, since there is more sharing
+
* Repository settings can be tuned on a per-project level
+

+
### Layout
+

+
Each project under radicle is stored in a bare Git repository containing
+
the local user's refs, as well as the refs of all of the peers that the
+
user's node is configured to track. A simple "namespacing" scheme can be used,
+
similar to the one used by git for remote tracking branches.
+

+
Since nodes replicate Git data from other nodes, a partitioning scheme is
+
needed to separate references belonging to each node within each project.
+
This can be achieved by using a node's unique identifier (its public key)
+
to namespace Git references, since references are by their nature hierarchical.
+

+
#### Special References
+

+
To store project metadata, special Git references are used. These are references
+
that are written to a known location that doesn't vary between projects, and in
+
some cases is meant to be hidden from the user, and accessed only through
+
purpose-built tooling.
+

+
The first one is the head of the project identity branch:
+

+
    refs/rad/id
+

+
This reference points to the latest version of the identity document. Users
+
who want to make changes to their project's identity document can checkout
+
this branch.
+

+
The second one is the *signed refs*:
+

+
    refs/rad/sigrefs
+

+
You'll notice these references are not under the `heads/` hierarchy and therefore
+
aren't regular branches. These references point to commit histories, though
+
these histories are disjoint from the source code's history. Since they
+
determine the outcome of project verification, they need to be handled with
+
care and are not intended to be accessed directly by an end user.
+

+
Storage layout will be specified in more detail in future work.
+

+
Canonicity
+
----------
+
In the above examples, we limited ourselves to projects with a single delegate.
+
For the majority of open source projects, this is fitting: a single maintainer
+
is in charge of the code. However, larger projects often have more than one
+
user with push access or commit rights. In the Radicle model, there are no
+
shared branches: each branch or set of branches under a tree is owned by
+
*one* key. This is why they are partitioned by a public key in the repository
+
hierarchy.
+

+
In a project with multiple delegates, for example, Alice, Bob and Eve, each
+
would have their *own* `master` branch which only they could write to, eg.:
+

+
    <alice>/refs/heads/master
+
    <bob>/refs/heads/master
+
    <eve>/refs/heads/master
+

+
So how does a contributor know which of those branches is the canonical one?
+
The protocol itself does not have a notion of canonicity. It is left up to
+
social consensus: perhaps the three delegates have agreed that Eve's `master`
+
branch will be the canonical one that everyone should pull from. This agreement
+
can be encoded in the project's identity document, and leveraged by tooling,
+
but it is entirely optional, as it may not be desirable for all projects to
+
"elect" a delegate that way.
+

+
For projects that do not have a canonical `master`, there is another option
+
to establish consensus on the state of a repository: *quorum*. Simply put,
+
we can examine the commit histories of all three `master` branches, and
+
pick the *latest* commit included in a *majority* of histories as our canonical
+
state. For example, in the example below, the commit referred to by `B`, that
+
is Bob's latest would be used.
+

+
    Alice       Bob       Eve
+

+
    C o                    D o
+
      |                     /
+
    B o        B o       B o <- quorum
+
      |          |         |
+
    A o        A o       A o
+
      |          |         |
+

+

+
Closing Thoughts
+
----------------
+
The protocols described above are part of a larger system. It's our hope that
+
subsequent RIPs will answer some of the questions that arise from this proposal,
+
as well as flesh out the functionality on top of the base protocol. How to
+
represent rich social interactions and collaboration such as code review,
+
comments, issues and patches on top of this protocol, is left for future
+
discussion.
+

+
Credits
+
-------
+
* Kim Altintop, for coming up with the original design this protocol is based on
+
* Alex Good, for helping iterate on some of the ideas found in this proposal
+
* Fintan Halpenny, for helping iterate on some of the ideas found in this proposal
+

+
Copyright
+
---------
+
This document is licensed under the Creative Commons CC0 1.0 Universal license.
added _rips/0002-identity.md
@@ -0,0 +1,439 @@
+
---
+
RIP: 2
+
Title: Identity
+
Author: '@cloudhead <cloudhead@radicle.xyz>'
+
Status: Draft
+
Created: 2022-12-06
+
License: CC0-1.0
+
---
+

+
In RIP #1, we discussed *repository identity*, and the *identity document*.
+
We said that to make it possible for repositories to be hosted in a peer-to-peer
+
network, Git repositories on their own are not enough: we need a secure way to
+
identify repositories that goes beyond source code. We need a stable identifier
+
and a mechanism for self-certifying repositories against this identifier,
+
so that changes to source code can be verified locally, by users.
+

+
In this RIP, we discuss the method through which we can achieve the above in
+
a secure, decentralized way.
+

+
Table of Contents
+
-----------------
+
* [Overview](#overview)
+
* [Peer Identity](#peer-identity)
+
* [Repository Identity](#repository-identity)
+
    * [Validation](#validation)
+
* [The Repository Identifier](#the-repository-identifier)
+
* [Identity Storage](#identity-storage)
+
    * [Verification](#verification)
+
* [Security](#security)
+
* [Closing Thoughts](#closing-thoughts)
+
* [Credits](#credits)
+
* [Copyright](#copyright)
+

+
Overview
+
--------
+
To introduce the topic of identity, we point the reader to the opening
+
paragraphs of the original specification of identities on the Radicle network,
+
which is still very much applicable to the Heartwood protocol:
+

+
> In order to collaborate on repositories within a consensus-free network, we
+
> must be able to refer to them using a stable identifier. Note that this
+
> identifier is a statement of intent: a repository can be described as a
+
> collection of ever-moving leaves of a DAG whose root element is the empty
+
> object. Therefore, the content of a repository is not enough to describe it –
+
> while two views on the repository may share objects, they may diverge
+
> substantially otherwise. Both views may however state their intent to
+
> eventually converge to the same state.
+
>
+
> While in principle a random identifier with sufficient entropy would suffice
+
> for the purpose, this would put the burden of deciding which repository views
+
> are legitimate entirely on the user. Instead, our approach is to establish an
+
> ownership proof, tied to the network identity of a peer, or set of peers,
+
> such that repository views can be replicated according to the trust
+
> relationships between peers (“tracking”).
+
>
+
> Our model is loosely based on The Update Framework (TUF)[^0], conceived as a
+
> means of securely distributing software packages.
+

+
With this in mind, there are three core components to the Radicle identity
+
system, for any given repository:
+

+
1. A set of peers on the network, each holding a signing key.
+
2. A document which establishes the identity of this repository, using these
+
   signing keys to self-certify.
+
3. A stable identifier that can be used to refer to the repository, derived
+
   from this document.
+

+
Peer Identity
+
-------------
+
Since Radicle repositories on the network are created by peers, we must first
+
establish the concept of a *peer identity*. In Heartwood, peers are simply
+
identified by their public key. This key is an Ed25519[^1] key that is encoded
+
as a DID using the `did:key` method[^2]. DIDs are used for interoperability
+
with other systems as well as allowing for other types of identifiers in the
+
future.
+

+
    did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
+

+
*Example of a peer identifier in DID format.*
+

+
We'll also note that peers on the network -- also called *nodes* are
+
indistinguishable from *users* at the protocol level. The terms "Node ID",
+
"Peer ID", "Public Key" are thus all used interchangeably.
+

+
Repository Identity
+
-------------------
+
With the establishment of peer identities, we can now move on to repository
+
identities. A repository identity consists of an identity document and an
+
associated unique identifier.
+

+
The identity document is a JSON document associated with a repository on Radicle.
+
The *hypothetical* minimal identity document looks like this:
+

+
    { "delegates": ["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
+
      "threshold": 1 }
+

+
It describes a repository with a single *delegate*. Delegates are trusted
+
entities that can cryptographically sign data within the scope of a given
+
repository. In the identity document, they are represented by a DID. As of this
+
RIP, only the `did:key` method is supported.
+

+
Using the `threshold` property, the document specifies that only *one* delegate
+
is required to sign updates to the repository. In this case, since we only
+
have one delegate, this is the only possible value for `threshold`.
+

+
> Repository delegates are responsible for signing all updates to a repository,
+
> whether it be source code commits or updates to the identity document itself.
+
> They can be thought of as repository "maintainers", though the applicability
+
> is broader. We will see how delegates sign repository updates in one of
+
> the following sections.
+

+
Though the above document could constitute a valid identity, it does not contain
+
any identifiable data that may be used to describe a particular repository.
+
This is what the `payload` section is for. Heartwood defines a single payload
+
type, `xyz.radicle.project`, which can be used to describe a project stored
+
in a repository:
+

+
    { "delegates": ["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
+
      "threshold": 1,
+
      "payload": {
+
        "xyz.radicle.project": {
+
          "name": "heartwood",
+
          "description": "Radicle Heartwood Protocol & Stack",
+
          "defaultBranch": "master"
+
        }
+
      }
+
    }
+

+
The string `xyz.radicle.project` is called a *payload ID*, and the `project`
+
payload is the default payload type for Radicle repositories. Using this payload,
+
type, a repository may be given a name, a description, and a default branch.
+

+
Identity documents are designed to be extensible, and developers may create
+
their own payload types and applications can choose which payload types to
+
support.
+

+
    { ...
+
      "payload": {
+
        "xyz.radicle.project": { ... },
+
        "xyz.radicle.funding": { ... },
+
        "com.atproto.account": {
+
          "email": "eve@atproto.com",
+
          "handle": "eve"
+
        }
+
      }
+
    }
+

+
<small>Figure 1. Fictional example of an identity with multiple payloads</small>
+

+
Payload IDs use reverse domain-name notation[^3] and are comprised of two
+
parts: an *authority*, eg. `radicle.xyz`, and a *name*, eg. `project`. To keep
+
payload types globally unique, developers wishing to create new payload types
+
must control the authority (domain) under which these live.
+

+
As of this RIP, there is only one recognized payload type: `xyz.radicle.project`.
+
Repositories which include this type of payload are sometimes called *projects*.
+

+
> When specifying new payload types, it's worth thinking about how the payload
+
> schema might evolve over time. For example, it might be worth versioning the
+
> payload types, either via the identifier (eg. `com.atproto.account.v1`) or
+
> via a field inside the payload (eg. `{"version": 1}`). This will ensure that
+
> changes to the payload schema are able to be made in a backwards compatible
+
> way.
+

+
### Validation
+

+
An identity document is valid if the following conditions are met:
+

+
* There is at least *one* `delegate`, but no more than `255`.
+
* Strings are not empty, and at most `255` characters long.
+
* The `threshold` is not zero and not greater than the number of `delegates`.
+
* The items in `delegates` are valid DIDs.
+
* There is a `payload` property with at least one payload object and a valid
+
  payload ID.
+
* Each payload under `payload` is valid according to the rules of that payload.
+

+
These rules can be partly described in the following JSON Schema[^4] document:
+

+
    {
+
      "$schema": "https://json-schema.org/draft/2020-12/schema",
+
      "type": "object",
+
      "properties": {
+
        "delegates": {
+
          "type": "array",
+
          "items": [{ "type": "string" }],
+
          "minItems": 1,
+
          "maxItems": 255,
+
          "uniqueItems": true
+
        },
+
        "threshold": {
+
          "type": "integer",
+
          "minimum": 1,
+
          "maximum": 255
+
        },
+
        "payload": {
+
          "type": "object",
+
          "additionalProperties": { "type": "object" },
+
          "minProperties": 1
+
        }
+
      },
+
      "required": ["delegates", "threshold", "payload"]
+
    }
+

+
Finally, the schema and validation rules for the `xyz.radicle.project` payload
+
are described as:
+

+
    {
+
      "$schema": "https://json-schema.org/draft/2020-12/schema",
+
      "type": "object",
+
      "properties": {
+
        "name": {
+
          "type": "string",
+
          "minLength": 1,
+
          "maxLength": 255
+
        },
+
        "description": {
+
          "type": "string",
+
          "maxLength": 255
+
        },
+
        "defaultBranch": {
+
          "type": "string",
+
          "minLength": 1,
+
          "maxLength": 255
+
        }
+
      },
+
      "required": ["name", "description", "defaultBranch"]
+
    }
+

+
The Repository Identifier
+
-------------------------
+
Now that we have a document describing our repository, we can derive from it
+
a unique identifier that can be used to refer to the repository on the
+
peer-to-peer network. This identifier must meet certain criteria:
+

+
1. It must be *stable*, in other words it must not change throughout the
+
   lifetime of the repository.
+
2. It must be deterministically derivable from the identity document alone, for
+
   the purpose of verification.
+
3. It must contain enough entropy to be globally unique.
+
4. It must allow for easy retrieval of the document from storage.
+

+
To fulfill the above, and given that Radicle uses Git for storage of repository
+
data, we choose to use the *Git Object ID*[^5] of the identity document, as
+
identifier. Git object IDs, or *OIDs* are "hardened" SHA-1 checksums of their
+
content, prefixed with a short header. We can compute this OID using the `git
+
hash-object` command. But before doing so, we must take care of one last thing:
+
to make the process of hashing our identity document fully deterministic, we
+
must first ensure our document is in canonical JSON form[^6]. This prevents
+
things like whitespace or key ordering from influencing the document hash and
+
therefore the identifier. In turn, this makes the identifier easier to compute
+
correctly.
+

+
The above document in canonical form looks like this:
+

+
    {"delegates":["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
+
    "payload":{"xyz.radicle.project":{"defaultBranch":"master","description":
+
    "Radicle Heartwood Protocol & Stack","name":"heartwood"}},"threshold":1}
+

+
We can now compute the Git object ID by placing the above JSON in a file, eg.
+
`radicle.json`, taking care to strip all newline characters from it, and
+
running the following command:
+

+
    $ git hash-object -t blob radicle.json
+

+
The output should be:
+

+
    d96f425412c9f8ad5d9a9a05c9831d0728e2338d
+

+
This SHA-1 hash is the document's OID. To turn it into a Radicle repository
+
identifer, we encode the underlying 20-byte hash value using `multibase`
+
encoding[^7] with the `base-58-btc` alphabet; the same encoding used for the
+
`did:key` method, and prefix `rad:` to it, making it a valid URN:
+

+
    "rad" ":" multibase(base58-btc, raw-oid-bytes)
+

+
This results in the repository identifier, or RID:
+

+
    rad:z42hL2jL4XNk6K8oHQaSWfMgCL7ji
+

+
This RID is theoretically unique thanks to the entropy provided by the delegate
+
key and payload.
+

+
Identity Storage
+
----------------
+
A storage system suitable for storing identity documents must have two
+
properties:
+

+
1. It must guarantee data integrity.
+
2. It must preserve the history of changes to the documents.
+

+
Radicle repositories are stored in Git, and criteria (1) is guaranteed by Git
+
natively, so long as we store our identity documents in the Git object database.
+
This is because Git hashes all objects under it, and structures its data such
+
that a change in hash of one object means a change in hash of all dependent
+
objects.
+

+
Criteria (2) can be guaranteed by encoding changes to the documents as a Git
+
commit history. Not only does a commit history allow us to preserve all changes,
+
it also proves, via hash-linking that no change was omitted from the history.
+

+
    Commit             Commit             Commit
+
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
+
    │ fb8e40a     │◄─┐ │ c581f25     │◄─┐ │ 43eb12d     │
+
    │             │  └─┤ fb8e40a     │  └─┤ c581f25     │
+
    │ ┌──────────┐│    │ ┌──────────┐│    │ ┌──────────┐│
+
    │ │ Document ││    │ │ Document ││    │ │ Document ││
+
    │ ├──────────┤│    │ ├──────────┤│    │ ├──────────┤│
+
    └─┴──────────┴┘    └─┴──────────┴┘    └─┴──────────┴┘
+

+
In the above diagram, we see three commits, with the left-most being the *root*
+
commit of the document, ie. the initial state, and the right-most being the
+
*head*, ie. the latest state.
+

+
Each commit includes within its Git tree, a blob named `radicle.json` which
+
contains a version of the identity document.
+

+
    ┌───────┬─────────┐                                   ┌─────┬───────────┐
+
    │commit │ fb8e40a │   ┌─────┬────────────────────┐  ┌►│blob │ d96f425   │
+
    ├───────┴─────────┤ ┌►│tree │ 82bc09a            │  │ ├─────┴───────────┤
+
    │tree   82bc09a   ├─┘ ├─────┴────────────────────┤  │ │{"delegates":...,│
+
    │author ...       │   │blob d96f425 radicle.json ├──┘ │ "payload":...,  │
+
    │                 │   └──────────────────────────┘    │ "threshold":...}│
+
    └─────────────────┘                                   └─────────────────┘
+

+
Using this representation, we can hence track all changes to a given identity.
+
To keep track of the head of this history, we use a special Git reference
+
named `rad/id` which points to the latest version of the identity document:
+

+
    fb8e40a ◄─ c581f25 ◄─ 43eb12d ◄─ refs/rad/id
+

+
Just like regular Git branches, when the identity is updated, `refs/rad/id`
+
is reset to point to the latest commit. Note that this commit history is
+
completely separate from the source code history pointed to by eg.
+
`refs/heads/master` and other branches. The identity history is kept in
+
the repository's *stored copy*, which is a bare repository, and is not included
+
in working copies.
+

+
### Verification
+

+
Verifying the latest state, ie. commit `43eb12d` is a matter of verifying all
+
preceding states, starting from the root (`fb8e40a`).
+

+
The root commit is verifiable intrinsically, since it contains as part of its
+
Git tree, the document from which we derived the RID. We can see above that the
+
original blob from which we computed the repository identifier is contained in
+
the tree pointed to by the root commit of the document history. The root commit
+
is hence valid for a given RID if and only if it contains a blob under the name
+
`radicle.json` containing a valid identity document which hashes to the given
+
RID, *and* the commit is signed by all delegates in the initial `delegates` list.
+

+
Once the root commit is verified, we can proceed to the next commit. Since
+
the document may have changed in this commit, the RID is no longer useful for
+
verifying this commit. Instead, we make sure that two conditions are fulfilled:
+

+
1. The commit containing the updated document is signed by a number of keys
+
   greater than or equal to the `threshold` property of the *previous*, valid
+
   version of the document.
+
2. Each of the aforementioned signatures belongs to a key that is part of the
+
   `delegates` set of the previous document version.
+

+
> Git commit signatures can be verified with the `git verify-commit` tool.
+

+
These delegate signatures are expected to be included in the commit header
+
under the `gpgsig` key, and be encoded in the SSH signature format.
+

+
    tree c66cc435f83ed0fba90ed4500e9b4b96e9bd001b
+
    parent af06ad645133f580a87895353508053c5de60716
+
    author Buck Mulligan <buck@mulligan.xyz> 1664467633 +0200
+
    committer Buck Mulligan <buck@mulligan.xyz> 1664786099 -0200
+
    gpgsig -----BEGIN SSH SIGNATURE-----
+
     U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgvjrQogRxxLjzzWns8+mKJAGzEX
+
     4fm2ALoN7pyvD2ttQAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
+
     AAAAQIQvhIewOgGfnXLgR5Qe1ZEr2vjekYXTdOfNWICi6ZiosgfZnIqV0enCPC4arVqQg+
+
     GPp0HqxaB911OnSAr6bwU=
+
     -----END SSH SIGNATURE-----
+
    gpgsig -----BEGIN SSH SIGNATURE-----
+
     U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgDb3ulFKnHALG8AnuuFPY9prvVZ
+
     kyLc73tcQ+HG3sCzQAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
+
     AAAAQM9rxErTt7AtcLypSyVM/jmd9/syO4D5hjMjL/9lbGzIkXXDL6+QlUsLipeLuYHV92
+
     F/6nm/lEaPUTeiZQ5o9AI=
+
     -----END SSH SIGNATURE-----
+

+
<small>A Git commit header with two SSH signatures.</small>
+

+
We proceed in this manner until the last commit in the history. If all commits
+
pass this verification process, we consider the identity valid. Note that every
+
version of the document must be validated according to the rules stated under
+
the [Validation](#validation) section. This includes the document payload,
+
and implies that application developers supporting payload extensions will
+
have to provide their own validation for these payloads, that will have to run
+
for each commit in the document history.
+

+
It's important to restate that for any commit `C`, other than the root commit,
+
verification is done by using the `delegates` and `threshold` values of the
+
*parent* commit to `C`, which has already been verified.
+

+
Security
+
--------
+
The combination of Git storage and cryptographic verification provides very
+
strong security and integrity guarantees around Radicle repositories and
+
identities:
+

+
* Omitted data up to the latest commit is detected by Git itself
+
* Tampering with the identity root will result in a different RID
+
* Adding a delegate key without the sign-off of the existing delegate set will
+
  fail verification
+

+
There is one possible attack that can be carried out by network participants:
+
serving old data. Since it isn't possible to know whether a document history
+
has a more recent update than the latest known update, a dishonest peer may
+
choose to hide the last *N* identity updates from its peers. This means it
+
will serve a stale document to its peers.
+

+
However, this attack is only effective if *all* of a victim's connected peers
+
perform this censorship. It takes only one honest peer to serve the full
+
document history for the censorship to fail.
+

+
Closing Thoughts
+
----------------
+
In this RIP we described an identity system for Git repositories that can be
+
used to securely distribute code on a peer-to-peer network. The system is
+
self-certifying and requires only basic Git primitives to implement.
+

+
Credits
+
-------
+
* Kim Altintop, for the original design this system is based on
+

+
Copyright
+
---------
+
This document is licensed under the Creative Commons CC0 1.0 Universal license.
+

+
[^0]: https://theupdateframework.github.io/specification/latest/
+
[^1]: https://ed25519.cr.yp.to/
+
[^2]: https://w3c-ccg.github.io/did-method-key/
+
[^3]: https://en.wikipedia.org/wiki/Reverse_domain_name_notation
+
[^4]: https://json-schema.org/
+
[^5]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
+
[^6]: https://datatracker.ietf.org/doc/html/rfc8785
+
[^7]: https://w3c-ccg.github.io/multibase/
added _rips/0003-storage-layout.md
@@ -0,0 +1,532 @@
+
---
+
RIP: 3
+
Title: Storage Layout
+
Author: '@fintohaps <fintan.halpenny@gmail.com>'
+
Status: Draft
+
Created: 2022-10-27
+
License: CC0-1.0
+
---
+

+
The storage layer is a crucial component of the Radicle network, and it is
+
designed with a local-first approach. This means that it can accommodate not
+
only the local operator's view of a repository, but also the views of peers in
+
whom the operator is interested. These views, also known as *forks* or *source
+
trees*, play a key role in enabling collaboration and version control within
+
the network.
+

+
Table of Contents
+
-----------------
+
* [Overview](#overview)
+
* [Layout](#layout)
+
* [Replication](#replication)
+
* [Working Copy](#working-copy)
+
    * [URL](#url)
+
    * [Refspecs](#refspecs)
+
    * [Example](#example)
+
    * [Remote Helper](#remote-helper)
+
        * [Authorization](#authorization)
+
* [Future Work](#future-work)
+
* [Appendix](#appendix)
+
    * [Alternative Designs](#alternative-designs)
+
        * [Associating a Working Copy](#associating-a-working-copy)
+
    * [Worked Example](#worked-example)
+
* [Credits](#credits)
+
* [Copyright](#copyright)
+

+
Overview
+
--------
+
In a peer-to-peer network, there is no centralized server or repository for
+
users to submit their changes. Additionally, the absence of a consensus
+
mechanism at the protocol level means that the sequence of operations cannot be
+
guaranteed. To tackle these issues, Radicle implements a partitioned approach
+
in which each user maintains their own local "fork" of a repository, as well as
+
any other forks they have an interest in. These forks are then shared among
+
users across the network. This method not only enhances the user experience by
+
allowing offline work but also eliminates the need for a server to process
+
data. Each repository fork has a single owner and writer, and users are only
+
permitted to make changes to their respective forks.
+

+
The storage layer must also be designed for efficient replication of data
+
between peers. For this reason, Git is used as the underlying protocol and
+
database, as it maps nicely to the type of data exchanged on the Radicle
+
network, and is flexible enough for our use case. In addition, Git has been
+
optimized for speed and disk space, and will automatically de-duplicate
+
repository data and fetch missing objects from peers[^0].
+

+
With the above in mind, this document proposes a storage layer that meets the
+
following requirements:
+

+
1. The storage layer is capable of maintaining a local copy of the working
+
   dataset.
+
2. The storage layer can store any number of repositories.
+
3. For each repository, it can represent multiple views, or *forks*, of
+
   the repository.
+
4. The storage layer can natively interoperate with Git.
+

+
There are two aspects to consider for Git interoperability:
+

+
1. Repository replication between peers.
+
2. Associating a *working* repository or "copy" with a *stored* repository.
+

+
In the next sections we will cover how the above works with the storage layout.
+

+
Layout
+
------
+
The storage layout must support multiple repositories and multiple peers per
+
repository. Each stored repository is a *bare* Git repository[^1]. To ensure
+
uniqueness and easy identification of repositories, a stable and globally
+
unique identifier, known as the Repository ID (RID), is assigned to each
+
stored repository. The RID for each repository is established according to the
+
guidelines provided in RIP#2's section *The Repository Identifier*, and is
+
represented as `<rid>` in diagrams found in this document.
+

+
Since our underlying storage uses Git, we represent the storage layout as a
+
file tree on the file-system, with `<storage>` representing the storage root,
+
or top-level directory under which all repositories are stored on a user's
+
device. Though this storage tree is browsable by the user with standard file
+
system commands, it is not meant to be interacted with directly by users,
+
for risk of corrupting the data. Additionally, Git is free to pack the objects,
+
which means they may not always appear as individual files.
+

+
    <storage>       # Storage root containing all local repositories
+
    ├── <rid>       # Some repository, e.g. a project, as a bare git repository
+
    │   └── refs    # All Git references under this project
+
    ├── <rid>
+
    │   └── refs
+
    ├── <rid>
+
    │   └── refs
+
    └── ...
+

+
<small>Basic overview of the storage layout with multiple repositories</small>
+

+
For every repository, each peer associated with that repository must have a
+
separate, logical Git source tree -- which contains all the usual reference
+
namespaces, i.e. `heads`, `tags`, and `notes`. This *logical repository* is
+
what we call *fork* or *view*, and allows peers to maintain different sets of
+
changes for the same physical repository.
+

+
    <storage>
+
    └─ <rid>                    # The "physical" Git repository
+
       └─ refs
+
          └─ namespaces         # All forks are stored under this namespace
+
             ├─ <nid>           # One peer's fork is stored here
+
             │  └─ refs
+
             ├─ <nid>           # Another peer's fork is stored here
+
             │  └─ refs
+
             └─ <nid>           # Etc.
+
                └─ refs
+

+
<small>Storage partitioning by Node ID or `<nid>`</small>
+

+
To have this separation, instead of having each peer stored in a separate Git
+
repository with a separate object database (ODB), the `gitnamespaces`[^2]
+
feature is used. For each peer, including the local peer, their unique
+
identifier is used as the namespace within each repository to separate Git
+
objects. The identifier used is described in *Peer Identity* in RIP#2, and is
+
usually known as the *Node Identifier* (NID):
+

+
> In Heartwood, peers are simply identified by their public key. This
+
> key is an Ed25519 key that is encoded as a DID using the `did:key`
+
> method. DIDs are used for interoperability with other systems as
+
> well as allowing for other types of identifiers in the future.
+

+
Thus, each peer can have its own namespace for references, while sharing the
+
objects with other peers via a shared ODB. This ensures only one copy of each
+
object is stored across all repository forks.
+

+
The storage uses the encoded public key portion of the `did:key` string as the
+
namespace path, denoted as `<nid>` or *Node ID* going forward. This means that
+
a peer's references will be scoped by their Node ID via the path prefix
+
`refs/namespaces/<nid>`. We demonstrate this organisation below in more detail:
+

+
    <storage>                     # Storage root containing all local repositories
+
    ├─ <rid>                      # Storage for first repository
+
    │  └─ refs                    # All Git references locally stored
+
    │     └─ namespaces           # All peer source trees or "forks"
+
    │        ├─ <nid>             # First node's source tree
+
    │        │  └─ refs           # First node's Git references
+
    │        │     ├─ heads       # First node's branches
+
    │        │     │   └─ master  # First node's master branch
+
    │        │     ├─ tags        # First node's tags
+
    │        │     │   ...
+
    │        │     └─ rad
+
    │        │         └─ id      # First node's version of the repository identity document
+
    │        │
+
    │        └─ <nid>             # Second node's source tree
+
    │           ├─ refs           # Second node's references
+
    │           └─ ...
+
    ├─ <rid>                      # Storage for second repository
+
    │   ...
+
    └─ <rid>                      # etc.
+
        ...
+

+
Note that top-level references may still exist, i.e. `<rid>/refs/{heads,tags}`.
+
The top-level namespace must be reserved for canonical references --
+
references that are agreed upon collaboratively, as published and stable. They
+
do not belong to any one peer and thus may be different on each device. How
+
canonical references are decided and written is left for a future RIP.
+

+
    <storage>
+
    └─ <rid>
+
       └─ refs
+
          ├─ HEAD                 # Canonical head reference
+
          ├─ heads                # Canonical branches
+
          │   └─ master           # Canonical master branch
+
          ├─ tags
+
          │   └─ v1.0.0           # Canonical v1.0.0 release tag
+
          ├─ rad
+
          │   └─ id               # Canonical identity reference
+
          └─ namespaces           # All peer source trees
+
             ├─ <nid>             # First node's source tree
+
             └─ <nid>             # Second node's source tree
+
             ...
+

+
<small>Example of canonical references under a repository</small>
+

+
Replication
+
-----------
+
Repository replication involves retrieving data from a remote peer. As the
+
storage consists of Git repositories, data can be transferred remotely using
+
the Git protocols[^3] and appropriate refspecs[^4]. However, this document does
+
not cover the protocol used or how to verify fetched data, as those topics are
+
beyond its scope. They may be discussed in a separate document.
+

+
That being said, we designed the storage layout such that it's easy to transfer
+
data between repositories over the network, using an unmodified Git protocol.
+
Using refspecs, it's possible to transfer only the objects we're interested in,
+
for example we can fetch only a certain peer's fork and not another.
+

+
Working Copy
+
------------
+
A working copy is a local copy of a repository, which corresponds to a
+
repository in storage. The operator can make changes to the source code in the
+
working copy. This is similar to how one would use `git clone` to obtain a copy
+
of an upstream repository, such as one hosted on GitHub or GitLab. Once the
+
changes have been made in the working copy, they can be pushed upstream. With
+
Radicle, changes are fetched and pushed between the *working* copy and the
+
*stored* copy within the local storage.
+

+
The connection between the working copy and the storage is maintained through a
+
set of Git remotes[^5], where each remote represents a single remote peer or
+
*namespace* for that repository and is associated with a Node ID.
+

+
The name of each remote, defined by the operator or application, can be
+
customized to suit their preferences. For instance, the operator may use the
+
Node ID of the peer, `origin`, `rad`, a nickname, or any other desired name.
+
By convention, we use the `rad` remote for the local peer's remote, such that
+
a user may push changes to his or her own fork with `git push rad`.
+

+
The URL of each Git remote must resolve the local storage's repository
+
corresponding to the working copy. As such, the URL serves as a mapping between
+
the working copy and the stored copy.
+

+
### URL
+

+
The URL scheme for a given Radicle remote is of the form:
+

+
    rad://<rid>[/<nid>]
+

+
* The `rad://` scheme is used for Radicle repositories, and identifies a
+
  project on the network. By using this scheme with Git, the user instructs Git
+
  to invoke the `git-remote-rad` executable during `git push` or `git fetch`,
+
  which allows the user to interact with the network through the storage layer.
+
  This will be covered in more detail in the *Remote Helper* section.
+
* The `<rid>` component is the repository identifier to be found in storage.
+
* The `<nid>` component is the Node ID which the `--namespace` option will
+
  be set to. If `<nid>` is not specified, Git will interact with the
+
  repository's *canonical references*.
+

+
Here's an example URL for repository `z42hL2jL4XNk6K8oHQaSWfMgCL7ji` and peer
+
`z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi`:
+

+
	rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji/z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
+

+
Here's a URL for the same repository's canonical references:
+

+
	rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji
+

+
### Refspecs
+

+
Since Git namespaces are used, the `fetch` refspec[^4] may be:
+

+
    +refs/heads/*:refs/remotes/<name>/*
+

+
The operator may also want to scope tags to particular remotes. This
+
can be achieved by using the `tagOpt` of a remote and adding another
+
fetch refspec.
+

+
    fetch = +refs/tags/*:refs/remotes/<name>/tags/*
+
    tagOpt = --no-tags
+

+
When using these refspecs with `git fetch` or `git push`, it is necessary to
+
specify the namespace that is being used for the operation. This can be
+
achieved using `git --namespace=<nid>` or `GIT_NAMESPACE=<nid> git`.
+
Unfortunately, this is somewhat cumbersome for the user and does not prevent
+
pushing to namespaces belonging to a non-local peer. This is remedied in
+
[Remote Helper](#Remote-Helper).
+

+
### Example
+

+
Here's an example remote configuration based on the above specifications:
+

+
    [remote "rad"]
+
        url = rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji/z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
+
        fetch = +refs/heads/*:refs/remotes/rad/*
+

+
To support fetching canonical references while pushing to the local peer's
+
namespace, a configuration like the following can be used:
+

+
    [remote "rad"]
+
        url = rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji
+
        pushurl = rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji/z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
+
        fetch = +refs/heads/*:refs/remotes/rad/*
+

+
In the above configuration, `git pull rad` would pull the canonical references
+
while `git push rad` would push to the local user's namespace.
+

+
For a more thorough example, see the [Appendix](#Appendix).
+

+
### Remote Helper
+

+
The remote helper is what allows Git to interpret URLs with the `rad://`
+
scheme.
+

+
As mentioned in the [Working Copy](#Working-Copy) section, there is currently
+
no way to configure a Git remote to be aware of additional logic, such as the
+
appropriate `refs/namespaces` to use (to avoid having to use `--namespace`) or
+
to prevent pushing to other peers' namespaces.
+

+
To address these requirements, a `git-remote-rad` helper binary can be
+
introduced to supply the necessary namespace and enforce the correct use of
+
peer namespaces.
+

+
`git-remote-rad` is a gitremote-helper[^8] binary. When Git encounters a URL
+
that uses the `rad` transport protocol, it delegates the call to
+
`git-remote-rad`, which should be found in the operator's path, during a
+
`fetch` or `push` operation.
+

+
#### Authorization
+

+
With the remote helper installed, `git push` can automatically set
+
`GIT_NAMESPACE` to the Node ID of the current user after verifying that it
+
matches the one specified in the URL, and reject pushes to other Node IDs.
+

+
When fetching, the remote helper can set `GIT_NAMESPACE` to whatever Node ID
+
is specified in the URL, as no authorization is required to fetch.
+

+
Future Work
+
-----------
+
You may have noticed that in this [layout](#Layout) the top-level namespace
+
is left for canonical references. The definition and verification of canonicity
+
is left for a future RIP.
+

+
Appendix
+
--------
+

+
### Alternative Designs
+

+
An alternative design for organizing peer source trees is to use the `remotes`
+
namespaces, i.e. `refs/remotes/<nid>`. This particular namespace is deemed
+
special by `git` and its tooling. A "remote" reference is one that corresponds
+
to a remote location. The remote location and how to fetch/push from/to is
+
configured using `git remote`[^6]. When `git fetch` is used for that remote, it
+
will place the references under `refs/remotes`[^7].
+

+
#### Associating a Working Copy
+

+
Continuing along this line of enquiry, we look at how this storage will link to
+
a working copy -- our personal directory for editing the code. As we previously
+
said, we will want to setup a remote in the working copy. This will look like
+
the following:
+

+
    [remote "alice"]
+
    url = file:///path/to/storage
+
    fetch = +refs/remotes/alice/heads/*:refs/remotes/alice/*
+

+
This will do what you expect when running:
+

+
    $ git fetch alice
+

+
However, you may be surprised that when running:
+

+
    $ git fetch alice master
+
    fatal: couldn't find remote ref master
+

+
It will not result in fetching the latest changes from `master`. In fact, it
+
will say no reference exists. To get the exact `master` we are looking for we
+
must run:
+

+
    $ git fetch alice refs/remotes/alice/heads/master
+

+
To explain, `git` tends to work under a DWIM (Do What I Mean) principle. The
+
`master` in `git fetch alice master` is ambiguous, in general. It could be
+
`refs/heads/master`, `refs/remotes/origin/master`,
+
`refs/remotes/alice/heads/master`, etc. `git` will assume that what you meant
+
was `refs/heads/master` and will look for this on the remote end, but of course
+
it does not exist.
+

+
This problem is only compounded with `refs/tags`[^7], where pushing a tag to a
+
remote will always DWIM and target the `refs/tags` namespace -- unless
+
otherwise specified.
+

+
Thus, we see that this design is not adequate.
+

+
### Worked Example
+

+
To begin we want to set up three git repositories: `storage`, `project`, and
+
`fork`. The `storage` repository will act like the Radicle storage, while
+
`project` and `fork` are working copies that will be linked to `storage` via
+
their remote entries.
+

+
    # Storage setup
+
    $ mkdir storage
+
    $ cd storage
+
    $ git init --bare
+

+
    # Project setup
+
    $ mkdir project
+
    $ cd project
+
    $ git init
+

+
    # Fork setup
+
    $ mkdir fork
+
    $ cd fork
+
    $ git init
+

+
#### Pushing Changes
+

+
Our first action will be to make changes in `project` and push them to
+
`storage`. In order for us to do that we need to create a remote in `project`,
+
create a commit, and push it to `storage`.
+

+
    # Add remote: "alice" will be used instead of a Node ID
+
    $ cd project
+
    $ git remote add alice file:///home/user/radicle/storage
+

+
    # Add a commit
+
    $ touch README.md && git add README.md && git commit -am "Add README"
+
    $ git --namespace=alice push alice master
+

+
`git` will then print out that it pushed a new branch and we can confirm by
+
inspecting the `refs` in `storage`.
+

+
    # Inspect refs
+
    $ cd storage
+
    $ tree refs
+
    refs
+
    ├── heads
+
    ├── namespaces
+
    │   └── alice
+
    │       └── refs
+
    │           └── heads
+
    │               └── master
+
    └── tags
+

+
#### Fetching Changes
+

+
Our next action will be to fetch the changes from `alice` in the `fork`
+
repository. To do this, we must add a remote -- like before -- and run a `git
+
fetch`.
+

+
    # Add remote; alice will mimic the public key hash
+
    $ cd fork
+
    $ git remote add alice file:///home/user/radicle/storage
+

+
    # Fetch the changes
+
    $ git --namespace=alice fetch alice
+

+
This will fetch the `heads` from `alice` and put them under the remote `alice`.
+
We can confirm this by inspecting the `refs` in `fork`.
+

+
    # Inspect refs
+
    $ tree .git/refs
+
    .git/refs
+
    ├── heads
+
    ├── remotes
+
    │   └── alice
+
    │       └── master
+
    └── tags
+

+
#### Different Peers
+

+
To imitate the reality that there will be a namespace per peer, we add a new
+
remote for `fork`. We can then make changes to `alice/master` and publish it
+
under the `bob` namespace.
+

+
    # Add bob remote
+
    $ git remote add bob file:///home/user/radicle/storage
+

+
    $ git merge bob/master
+
    $ echo "Hello, Radicle" >> README.md
+
    $ git commit -am "Hello, Radicle"
+
    $ git --namespace=bob push bob master
+

+
Again, we can confirm this did what we wanted in `storage`.
+

+
    # Inspect storage refs
+
    cd storage
+
    tree refs
+
    refs
+
    ├── heads
+
    ├── namespaces
+
    │   ├── alice
+
    │   │   └── refs
+
    │   │       └── heads
+
    │   │           └── master
+
    │   └── bob
+
    │       └── refs
+
    │           └── heads
+
    │               └── master
+
    └── tags
+

+
#### Non-global Tags
+

+
Often we find that pushing tags pollutes the `refs/tags` namespace since they
+
do not get placed under `remotes` when fetching. With the use of the
+
`gitnamespaces` feature we avoid this.
+

+
    $ cd fork
+
    $ git tag v1.0.0
+
    $ git push v1.0.0
+

+
    # Inspect storage refs
+
    refs
+
    ├── heads
+
    ├── namespaces
+
    │   ├── alice
+
    │   │   └── refs
+
    │   │       └── heads
+
    │   │           └── master
+
    │   └── bob
+
    │       └── refs
+
    │           ├── heads
+
    │           │   └── master
+
    │           └── tags
+
    │               └── v1.0.0
+
    └── tags
+

+

+
This shows that namespaces are superior in organising references correctly for
+
each given peer.
+

+
Credits
+
-------
+
* Kim Altintop, for shining the light on the lesser known `gitnamespaces`[^2]
+
  feature while developing `radicle-link`.
+
* Alex Good, for attempting to implement a feature dubbed "ref rewriting" to
+
  solve the remotes problem, before realising that using `gitnamespaces`[^2]
+
  could be a better option.
+

+
Copyright
+
---------
+
This document is licensed under the Creative Commons CC0 1.0 Universal license.
+

+
[^0]: https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols
+
[^1]: https://git-scm.com/docs/git-init#Documentation/git-init.txt---bare
+
[^2]: https://git-scm.com/docs/gitnamespaces
+
[^3]: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols
+
[^4]: https://git-scm.com/book/en/v2/Git-Internals-The-Refspec
+
[^5]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes
+
[^6]: https://git-scm.com/docs/git-remote
+
[^7]: https://git-scm.com/book/en/v2/Git-Internals-Git-References
+
[^8]: https://git-scm.com/docs/gitremote-helpers
added _rips/README.md
@@ -0,0 +1,3 @@
+
# RIPs
+

+
Radicle Improvement Proposals 🌱.