RIPs on the Website — radicle.xyz

This is an attempt to make the RIPs available on radicle.xyz

Some thoughts:

Importing the RIPs should be done with git submodule
None or minimal manual modifications should be necessary to update.
RIPs might be modified slightly to support this.
Standardize how RIPs refer to each other.

 [ruby]: https://www.ruby-lang.org/en/documentation/installation/
 ## RIPs
 They live in `_rips` as a squashed subtree, using `git subtree`.
 If you plan to work on the integration with RIPs, it is *very*
 helpful to add the repository as a remote:
 	git remote add rips rad://z3trNYnLWS11cJWC6BbxDs5niGo82
 Then, to update the subtree:
 	git subtree --prefix _rips pull rips master
 ## License
 Licensed under [CC BY-NC-SA 4.0][license]. See [LICENSE](LICENSE).

   pages:
     output: true
     permalink: /:path/
   rips:
     output: true
     permalink: /rip/:path/
 highlighter: rouge

       path: _posts
     values:
       layout: post
   - scope:
       path: _rips
     values:
       layout: rip

 <!DOCTYPE html>
 <html lang="{{ site.lang | default: "en-US" }}">
   <head>
     {% include meta.html %}
     <link rel="stylesheet" type="text/css" href="{{ "/assets/css/guide.css" | relative_url }}"/>
   </head>
   <body>
     <header>
       <div class="title">
         <a href="/guides">
           <img src="{{ "/assets/images/radicle.svg" | relative_url }}" alt="" />
           <span>Radicle Improvement Proposals</span>
         </a>
       </div>
       <nav>
       </nav>
       <button class="toggle" id="toggle-theme"><!-- Set by script --></button>
     </header>
     <main>
 	    <h1>RIP {{ page.RIP}}: {{ page.Title }}</h1>
       {% include toc.html html=content class="toc" item_class="toc-h%level%" %}
       <hr />
       {{ content }}
     </main>
     <footer>
       <p>&copy; The Radicle Team</p>
     </footer>
     <script src="{{ "/assets/js/toggle-theme.js" | relative_url }}"></script>
   </body>
 </html>

 ---
 RIP: 1
 Title: Heartwood
 Author: '@cloudhead <cloudhead@radicle.xyz>'
 Status: Draft
 Created: 2022-09-06
 License: CC0-1.0
 ---
 In this RIP, we define a major iteration of the Radicle network protocol and
 the various related sub-systems. We call it "Heartwood".
 The intent of this proposal is not to define a complete specification of the
 protocol, but to be a foundation for subsequent RIPs. Various aspects
 of the protocol, in particular around the issues of privacy, censorship
 resistance, peer misbehavior and DoS are left for future RIPs to expand on,
 and won't be tackled here. Additionally, details and specifics on the wire
 protocol, message formats, hash functions and encodings may be left out,
 to focus this proposal on the big ideas.
 Overview
 --------
 The Radicle network protocol can be defined through the intended use-case for a
 peer-to-peer code hosting network:
 *Alice publishes a repository on the network under a unique identifier, and
 Bob, using that identifier is able to retrieve it from the network whether
 Alice is online or offline, and verify the authenticity of the data.*
 The above must hold true independent of the network topology and number of
 nodes hosting the project, as long as at least one node is hosting it. We can
 therefore say that the primary function of the protocol is to locate repositories
 on the network, and serve them to users, all in a timely, and resource-efficient
 manner.
 This functionality can be broken down into three components:
 . Locating repositories, ie. finding which nodes host a given repository
 . Replicating a given repository between two nodes, such that they both
    hold a copy
 . Verifying the authenticity of all the data retrieved from the network, such
    that any node can serve any data without the need for trust
 To achieve (1), nodes need to exchange information about which repositories
 they host, so that they can point users to the right locations, as well as
 notify each other when there are updates to the repositories. This in turn
 requires *peer* discovery: nodes need a way to find each other on the network.
 To achieve (2), git is used for its excellence as a replication protocol.
 To achieve (3), given the nature of peer-to-peer networks, ie. that any node
 can join the network, git alone is not enough. Replication through git needs to
 be paired with a way of verifying the authenticity of the data being
 replicated. While git checks for data *integrity*, our protocol will have to
 make sure that the data Bob downloads is the data Alice published on the
 network. Without such verification, an intermediary node on the network could
 easily tamper with the data before serving it to Bob. We also can't require
 that Alice serve her data directly to Bob, as that would require them to be
 online at the same time, and would introduce a single point of failure.
 Table of Contents
 -----------------
 * [Repository Identity](#repository-identity)
     * [The Identity Document](#the-identity-document)
 * [Repository Discovery](#repository-discovery)
     * [Topology](#topology)
     * [Routing](#routing)
 * [Node Identity](#node-identity)
 * [Gossip](#gossip)
     * [Inventory Announcements](#inventory-announcements)
         * [Pruning](#pruning)
     * [Reference Announcements](#reference-announcements)
     * [Node Announcements](#node-announcements)
         * [Bootstrap Nodes](#bootstrap-nodes)
 * [Replication](#replication)
     * [Project Tracking and Branches](#project-tracking-and-branches)
     * [Unintentional Forks and Conflicts](#unintentional-forks-and-conflicts)
 * [Storage](#storage)
     * [Layout](#layout)
         * [Special References](#special-references)
 * [Canonicity](#canonicity)
 * [Closing Thoughts](#closing-thoughts)
 * [Credits](#credits)
 * [Copyright](#copyright)
 Repository Identity
 -------------------
 To locate, or even "talk" about repositories on a peer-to-peer network, we
 require a stable, unique identifier that can be verifiably associated with a
 repository. Without this, there is no way for a user to request a specific
 repository and verify its authenticity. Unlike centralized forges such
 as GitHub, where repositories are deemed authentic based on their *location*,
 eg. `https://github.com/bitcoin/bitcoin`; in an *untrusted* network, location
 is not enough and we need a way to automatically verify the data we get from any
 given location. Therefore, before we talk about networking, we must make a
 little detour into repository identity.
 It's important to understand that although git repositories use content
 addressing for their objects, repositories are *mutable* data-structures.
 Therefore, the identity of a repository *cannot* be derived solely from its
 contents. Instead, the identity must be determined by some other authority.
 In Radicle, this is no other than the *maintainers* of the repository, since it
 is their mandate to decide what gets merged into a codebase.
 We can then define a repository's identity as the set of all branches and tags
 that the maintainers of the repository agreed upon at a given point in time,
 along with a unique identifier.
 For anyone to be able to verify an identity, we require maintainers to provide a
 cryptographic signature over the repository's heads, tags, and other relevant
 git references, along with repository metadata such as name and description.
 We call this the *signed refs*. Signed refs can be updated whenever there are
 changes to a repository that are accepted by maintainers. They represent a
 repository's *canonical state*.
 As for the identifier, it must be provably associated with the above state.
 In other words, it must be possible, given an identifier and a set of signed
 refs, to prove association.
 ### The Identity Document
 Before a repository can be published to the network, it needs to be initialized
 into a Radicle *project*. A project is simply a repository with an associated
 *identity document*. In this document, the public keys of the repository's
 current maintainers are stored. When a project is initialized from an existing
 git repository for the first time, the user initializing becomes the de-facto
 initial maintainer of the project, and his key is included in the new identity
 document's key ring. We call this set of trusted keys the project's *delegation*,
 and each key is called a *delegate*. Though these will often map one to one with
 maintainers, this is not a requirement. The only requirement is that they be
 trusted to represent a given project, as they will be used to determine
 the canonical state of the project repository.
 From this initial identity document we can then derive a unique, stable
 identifier for the project, by hashing the document's contents. In addition to
 the *delegation*, we include in the document a user-chosen *alias* for the
 project, as well as a *description*. The process for hashing the document shall
 be discussed in a subsequent RIP.
 It is by including the identity document in the *signed refs* that we establish
 a relationship between the source code and the identity, and thus associate
 the project identifier with the project source code. Note that this permits
 identical source codes to have more than one identity. This is useful when
 a user wishes to a *fork* a repository. In that case, a new project would
 be initialized with a brand new identifier, but a mostly identical source code
 history.
 The storage, update and verification mechanism for the identity document
 will be discussed in more detail in a future RIP. For the purposes of this
 document, we can assume a verification mechanism that takes as input the project
 identifier, signed refs, and identity document, and outputs whether the project
 is valid or not.
 At the networking level, all we need is a way to derive stable identifiers for
 repositories, and a verification process that asserts that a given repository
 corresponds to some project identifier.
 Repository Discovery
 --------------------
 The Radicle network protocol has to serve one core purpose: given a project
 identifier, a user should be able to retrieve the full project source code
 associated with that identifier, and verify its authenticity. This function
 should be independent of where the project is located, and how many replicas
 exists, provided at least one replica exists.
 ### Topology
 Given that there is no natural incentive for nodes to host *arbitrary* projects,
 nodes on the network should be given the choice of which projects to host.
 For example,
 * A company that uses or provides open source software may want to host it on
   their node, to ensure its continued availability on the network.
 * A business that hosts projects for a fee would need to be able to choose
   which projects it hosts.
 * A developer contributing to a project may want to self-host it on his node.
 For this reason, we cannot deterministically compute on which node(s)
 a given project should be hosted, as is the case with DHTs. Nodes are able
 to choose what they host, and therefore the network is fundamentally
 *unstructured*. Some nodes may host thousands of projects, while others may
 host only one or two. Though there is a benefit to arranging the peer
 topology in a certain fashion (eg. to reduce communication overhead), this
 cannot be relied on in an untrusted network, and therefore we don't make
 these assumptions in the basic protocol either.
 ### Routing
 The general problem of reaching a specific node on the network is usually
 known as "routing". Where IP routing tries to route traffic to a certain
 IP address, in the Radicle network, we attempt to route requests to
 one or more nodes that host a given project; these are called *seeds*
 in the context of that project. A seed is a node that hosts and serves
 one or more projects on the network.
 Routing information is usually stored in a *routing table* that is keyed
 by the "target", in our case this is the project identifier:
     RoutingTable = BTreeMap<RepositoryId, Vec<NodeId>>
 For each project, we keep track of the nodes that are known to host this
 project. Using hashes for project identifiers and IP addresses as node
 identifiers, the table might look something like:
 a2970…        54.122.99.1, 89.2.23.67
     c5e079e…        66.12.193.8, 89.2.23.67, 12.43.212.9
     …
 To build the table, nodes gossip information about other nodes, namely *what*
 projects are hosted *where*.
 Assuming `32 byte` project and node identifiers, an average of `3` nodes
 hosting each project, and a million projects, all stored in a binary tree,
 we would need only about `244 MB` to store the entire routing table in memory
 with no compression:
     project count = 1'000'000
     leaf size = 32 + 32 * 3 = 128 B
     leaves size = leaf size * project count = 128'000'000 B = ~122 MB
     index size = leaf size * (project count - 1) = ~122 MB
     total = 122 MB + 122 MB = ~244 MB
 Since this amount of memory is available on commodity hardware, we see
 no need to partition the routing table for the time being, and propose
 that each node store the entire routing table on disk or in memory.
 Node Identity
 -------------
 The identity of a node on the network is simply the identity of the user
 operating the node. To be able to securely verify data authorship, we use
 public key cryptography, with the public key being used as the node identifier.
 In the case of nodes run by end-users; which is likely most nodes; the
 node's secret key is used to create the *signed refs* and optionally to
 sign git commits.
 The use of the same identity for both network communications and code signing
 makes the network more transparent, while allowing nodes to trust each other
 based on the code they publish.
 For nodes that are run as always-on "servers", the node identity may not be
 used for signing code. These *seed* nodes only use their secret keys to
 sign gossip messages and establish secure connections.
 Gossip
 ------
 We design the Radicle networking layer as a *gossip* protocol. In this proposal,
 we go over some of the fundamental types of messages that are sent between
 peers over the network. We contend that the core functionality can be achieved
 with three message types: *inventory* announcements, *reference* announcements
 and *node* announcements. Each fulfilling a distinct role. The exact wire
 protocol will be described in a future proposal; this section should serve
 as a short introduction to the topic.
 ### Inventory Announcements
 To build their routing table, nodes connecting to the network announce to their
 peers what inventory they have, ie. what projects they are seeding.
 These announcements are relayed to other connected peers, and so
 on until they reach the majority of nodes on the network. Messages that have
 already been seen are dropped, to prevent messages from propagating forever.
 Gossip messages may be retained by nodes for a certain amount of time, so that
 they can be served to new nodes joining the network. The *inventory
 announcement* message has the following shape:
     InventoryAnnouncement = (
         NodeId,
         Vec<RepositoryId>,
         Timestamp,
         Signature,
     )
 It contains the identifier of the node making the announcement, the inventory
 of projects, a timestamp, and the signature of the node over the projects and
 timestamp. By using a public key as the `NodeId`, we can then both identify
 and verify the provenance of the message, using the signature.
 In this manner, every node in the network will eventually converge
 towards a single routing table, provided the network is well connected.
 For larger networks, where nodes cannot be fully meshed, it's desirable for
 seeds that have projects in common to be connected to each other. Hence,
 nodes should prioritize connecting to peers that seed the same projects as
 them. This is simply because relevant messages can reach interested nodes
 more quickly and efficiently if nodes with shared interests are directly
 connected to each other; but also because nodes can use already-established
 connections to fetch data of interest.
 As is often the case with large, unstructured networks, gossip messages can be
 received out of order. For this reason, the inventory message includes a
 *timestamp*, which is used for ordering messages. Since the inventory message
 is meant to communicate a node's complete inventory, nodes can simply ignore
 inventory messages with timestamps lower than the latest received, and not
 relay them. To mitigate issues with timestamps far in the future, we reject
 messages with timestamps too far in the future.
 #### Pruning
 One worry with routing tables on permissionless networks is that nodes come and
 go all the time. While a project may be available on a node one day, the node
 may go offline the next day and never come back online. Additionally, nodes
 may choose to stop hosting a certain project, making it no longer available.
 Hence, the routing table needs to be constantly pruned, with out-of-date
 entries evicted. To achieve this, we set an expiry on routing table entries,
 and require live nodes to "refresh" their entries on other nodes by sending
 `inventory` messages periodically.
 Entries that have been in the table for more than a day without updates or
 refreshes can then be automatically evicted.
 ### Reference Announcements
 When an update to a project is made by a user, the user's node sends a message
 to the network, announcing the update. Nodes that are tracking this project are
 then able to fetch the updates via the *git* protocol, either directly from the
 user's node, or from an intermediary node. This *refs announcement* message
 looks like this:
     RefsAnnouncement = (
         NodeId,
         RepositoryId,
         Map<RefName, CommitId>,
         Signature,
     )
 It contains the identifier of the node announcing the updated references, the
 repository under which these refs reside, the map of reference names (`RefName`)
 with their new commit hashes (`CommitId`) and a signature from the publishing
 node (`NodeId`), over the refs and project identifier.
 This allows any receiving node tracking the project to verify the legitimacy
 of the message using `NodeId` and `Signature`. For new projects, published
 on the network for the first time, the same type of message can be used.
 Reference announcements, unlike inventory announcement should only be
 relayed to interested nodes, ie. nodes that are hosting the given project,
 as they will usually be followed by a `git-fetch`.
 We should also note that the `NodeId` in this case is not only the announcer
 of these updated references, but may be the *author*. When Alice pushes changes
 to a project, she announces these changes over the network using a reference
 announcement, via her own node.
 ### Node Announcements
 We've touched upon inventory gossip, and how project metadata and data is
 exchanged, but not how node metadata is exchanged; or in other words, how *peer
 discovery* is carried out. For this purpose, we devise a *node announcement*
 message:
     NodeAnnouncement = (
         NodeId,
         NodeFeatures,
         Vec<Address>,
         Timestamp,
         Signature,
     )
 This message is designed to be authored by a node announcing *itself* on the
 network, and therefore contains a signature and timestamp, and is meant to be
 relayed by other nodes on the network. The key "payload" of this message is
 the vector of addresses sent by the node. This should contain all addresses
 on which the node is publicly reachable. At minimum, this should contain
 one IP address, but in the future could contain `.onion` addresses or DNS
 names.
 As with the inventory message, nodes should buffer these announcements to serve
 them to new nodes connecting to the network. In addition to the list of
 addresses, we propose to also include a list of features supported by the
 announcing node, to allow for future protocol upgrades.
 #### Bootstrap Nodes
 A node joining the network for the first time will not know of any peers.
 Hence, it's advised that network client software be pre-configured with
 DNS "seeds". These are registered DNS names, eg. `seeds.radicle.xyz` that
 resolve to node addresses on the network. In the bootstrapping process,
 nodes can resolve these names to have a set of addresses to initially
 connect to, and once they find a peer, use the regular peer discovery
 process to find more nodes.
 Replication
 -----------
 While gossip is used to exchange *metadata*, the actual repository *data*, ie.
 Git objects are transferred via the process of *replication*.
 Nodes are configured with a list of projects that they are meant to host. These
 are called *tracked* projects, and this configuration is called the *tracking
 policy*.
 When a new node joins the network, the first thing it will attempt to do is to
 retrieve these tracked projects from the network. This is called *bootstrapping*.
 To do this, the node consults its routing table, locates the project's *seeds*,
 and initiates a `git-fetch` via the *git* protocol, with one or more seed. This
 fetch operation downloads the relevant git objects into the node's *storage*,
 making them available to other interested nodes.
 To notify its peers that its inventory has changed, it sends an *inventory*
 message to each of its peers. Replication is only possible because of the
 exchange of information on the gossip layer. Without it, nodes wouldn't
 know where to replicate projects from, and would quickly fall behind.
 ### Project Tracking and Branches
 While it's possible to always replicate and track at the *repository* level,
 it is highly impractical: such an *open* tracking policy is easily abused by
 malicious nodes. Given limited disk space and bandwidth, nodes need a way to
 replicate only a *subset* of repository data, authored by users they can
 trust.
 If we want to allow contributors to publish code that is intended to be merged
 into another project, we need to think about a node's tracking policy with
 respect to individual contributors and their git reference *trees*
 (the set of all branches published under a project by a given contributor).
 The safest tracking policy is to only track trees published by the delegates of
 a tracked project. But this means that contributors (non-delegates) won't have
 their changes replicated on the network. This becomes a problem when a
 contributor wants to propose a new patch to a project: unless the contributor
 is online to *seed* that branch, there is no way for the maintainer to retrieve
 it.
 Lacking a good answer to this problem at the protocol level, we defer to node
 operators to implement policies at the individual seed level. For example, a
 seed node may require of users to link their social profiles with their Radicle
 key before enabling tracking for their branches.
 In the past, we've explored the idea of a "social" tracking graph. We leave
 this door open for potential future iterations of the protocol.
 ### Unintentional Forks and Conflicts
 It is possible, through a software bug or user error to unintentionally fork
 one's history. For example, a user publishes changes to a branch that land
 on a node in the network, but later re-writes that branch's history and
 re-publishes it. Nodes that received the initial changes will not be able
 to merge these new changes via a simple history *fast-forward*, while nodes
 that never saw the initial changes will have no problem fetching the new ones.
     User       Seed #1       Seed #2
     B                          B
     |  A            A          |
     | /            /           |
     |/            /            |
     |            |             |
 In the above diagram, a user publishes history `A` under branch `master`,
 which lands on `Seed #1` and then publishes history `B` under the same branch.
 `Seed #2`, which didn't have the prior history is able to fetch that branch
 without problems. However, `Seed #1` isn't, since the histories are divergent.
 Now, depending on which seed is used to fetch, a user would get a completely
 different history for that branch.
 To solve this problem, we have to realize one simple thing: every time a user
 publishes new code to the network, a commit in the *signed refs* history is
 created. This means that after publishing `B`, the user's signed refs history
 will look like this:
     refs/…/heads/master    B
     refs/…/heads/master    A
     …
 Since `Seed #1` will have this history if it fetches `B`, it will be able to
 simply set the head of the branch to the latest signed ref for that branch,
 which is `B`.
 Forks in the signed refs history itself, though much more problematic, could
 be recovered by picking the history with the latest timestamp.
 Storage
 -------
 Since storage and replication are tightly coupled, and replication makes use of
 Git, so does storage. Unlike previous versions of Radicle, each project is
 stored in its own *bare* Git repository, under a common base directory.
 Storage is accessed directly by the node to report inventory to other nodes,
 and accessed by the end user through either specialized tooling or `git`.
 It's important to know that a user working on a project will typically have
 *two* copies of the repository: one in storage, called the *stored* copy,
 and one *working copy*. The working copy will be setup in such a way that it is
 linked to storage via a *remote* named `rad`. Publishing code is then a matter
 of running for eg. `git push rad`. Code in storage can be considered *public*,
 since it is shared with connected peers, while code in the working copy is
 considered *private* until pushed.
 This allows for code to be staged for publishing even while offline, provided
 the user's storage is accessible. In most cases, it makes sense to keep
 storage and working copies on the same machine.
     ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┐          ┌╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┐
     ┆ ┌───────────────────┐ ┌──────┐ ┆          ┆ ┌──────┐ ┌─────────────────┐ ┆
     ┆ │ Storage           │ │      │ ┆   Git    ┆ │      │ │ Storage         │ ┆
     ┆ │                   ├╸┆╸╸╸╸╸╸┆╸╸╸╸╸╸╸╸╸╸╸╸╸╸┆╸╸╸╸╸╸┆╸┤                 │ ┆
     ┆ │ ┌──────┐ ┌─────┐ ┌│ │      │ ┆ protocol ┆ │      │ │ ┌─────┐ ┌─────┐ │ ┆
     ┆ │ │repo  │ │repo │ ││ │      │ ┆          ┆ │      │ │ │repo │ │repo │ │ ┆
     ┆ │ ├──────┤ ├─────┤ ├│ │      │ ┆          ┆ │      │ │ ├─────┤ ├─────┤ │ ┆
     ┆ └─┴───╿──┴─┴───┬─┴─┴┘ │      │ ┆          ┆ │      │ └─┴───┬─┴─┴───╿─┴─┘ ┆
     ┆       │        │      │      │ ┆  gossip  ┆ │      │       │       │     ┆
     ┆       │        │      │ Node ├╸╸╸╸╸╸╸╸╸╸╸╸╸╸┤ Node │       │       │     ┆
     ┆       │        │      │      │ ┆ protocol ┆ │      │       │       │     ┆
     ┆      push     pull    │      │ ┆          ┆ │      │      pull    push   ┆
     ┆       │        │      │      │ ┆          ┆ │      │       │       │     ┆
     ┆       │        │      │      │ ┆          ┆ │      │       │       │     ┆
     ┆       │        │      │      │ ┆          ┆ │      │       │       │     ┆
     ┆  ┌────┴───┐ ┌──╽─────┐│      │ ┆          ┆ │      │ ┌─────╽──┐ ┌──┴────┐┆
     ┆  │working │ │working ││      │ ┆          ┆ │      │ │working │ │working│┆
     ┆  │copy    │ │copy    ││      │ ┆          ┆ │      │ │copy    │ │copy   │┆
     ┆  └────────┘ └────────┘└──────┘ ┆          ┆ └──────┘ └────────┘ └───────┘┆
     └╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┘          └╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┘
 Splitting the storage into per-project repositories has numerous advantages
 over previous designs that used a "monorepo":
 * Private repositories become possible to offer
 * Concurrency is simpler, since we can have locks at the project level
 * The Git object database is used more efficiently, since there is more sharing
 * Repository settings can be tuned on a per-project level
 ### Layout
 Each project under radicle is stored in a bare Git repository containing
 the local user's refs, as well as the refs of all of the peers that the
 user's node is configured to track. A simple "namespacing" scheme can be used,
 similar to the one used by git for remote tracking branches.
 Since nodes replicate Git data from other nodes, a partitioning scheme is
 needed to separate references belonging to each node within each project.
 This can be achieved by using a node's unique identifier (its public key)
 to namespace Git references, since references are by their nature hierarchical.
 #### Special References
 To store project metadata, special Git references are used. These are references
 that are written to a known location that doesn't vary between projects, and in
 some cases is meant to be hidden from the user, and accessed only through
 purpose-built tooling.
 The first one is the head of the project identity branch:
     refs/rad/id
 This reference points to the latest version of the identity document. Users
 who want to make changes to their project's identity document can checkout
 this branch.
 The second one is the *signed refs*:
     refs/rad/sigrefs
 You'll notice these references are not under the `heads/` hierarchy and therefore
 aren't regular branches. These references point to commit histories, though
 these histories are disjoint from the source code's history. Since they
 determine the outcome of project verification, they need to be handled with
 care and are not intended to be accessed directly by an end user.
 Storage layout will be specified in more detail in future work.
 Canonicity
 ----------
 In the above examples, we limited ourselves to projects with a single delegate.
 For the majority of open source projects, this is fitting: a single maintainer
 is in charge of the code. However, larger projects often have more than one
 user with push access or commit rights. In the Radicle model, there are no
 shared branches: each branch or set of branches under a tree is owned by
 *one* key. This is why they are partitioned by a public key in the repository
 hierarchy.
 In a project with multiple delegates, for example, Alice, Bob and Eve, each
 would have their *own* `master` branch which only they could write to, eg.:
     <alice>/refs/heads/master
     <bob>/refs/heads/master
     <eve>/refs/heads/master
 So how does a contributor know which of those branches is the canonical one?
 The protocol itself does not have a notion of canonicity. It is left up to
 social consensus: perhaps the three delegates have agreed that Eve's `master`
 branch will be the canonical one that everyone should pull from. This agreement
 can be encoded in the project's identity document, and leveraged by tooling,
 but it is entirely optional, as it may not be desirable for all projects to
 "elect" a delegate that way.
 For projects that do not have a canonical `master`, there is another option
 to establish consensus on the state of a repository: *quorum*. Simply put,
 we can examine the commit histories of all three `master` branches, and
 pick the *latest* commit included in a *majority* of histories as our canonical
 state. For example, in the example below, the commit referred to by `B`, that
 is Bob's latest would be used.
     Alice       Bob       Eve
     C o                    D o
       |                     /
     B o        B o       B o <- quorum
       |          |         |
     A o        A o       A o
       |          |         |
 Closing Thoughts
 ----------------
 The protocols described above are part of a larger system. It's our hope that
 subsequent RIPs will answer some of the questions that arise from this proposal,
 as well as flesh out the functionality on top of the base protocol. How to
 represent rich social interactions and collaboration such as code review,
 comments, issues and patches on top of this protocol, is left for future
 discussion.
 Credits
 -------
 * Kim Altintop, for coming up with the original design this protocol is based on
 * Alex Good, for helping iterate on some of the ideas found in this proposal
 * Fintan Halpenny, for helping iterate on some of the ideas found in this proposal
 Copyright
 ---------
 This document is licensed under the Creative Commons CC0 1.0 Universal license.

 ---
 RIP: 2
 Title: Identity
 Author: '@cloudhead <cloudhead@radicle.xyz>'
 Status: Draft
 Created: 2022-12-06
 License: CC0-1.0
 ---
 In RIP #1, we discussed *repository identity*, and the *identity document*.
 We said that to make it possible for repositories to be hosted in a peer-to-peer
 network, Git repositories on their own are not enough: we need a secure way to
 identify repositories that goes beyond source code. We need a stable identifier
 and a mechanism for self-certifying repositories against this identifier,
 so that changes to source code can be verified locally, by users.
 In this RIP, we discuss the method through which we can achieve the above in
 a secure, decentralized way.
 Table of Contents
 -----------------
 * [Overview](#overview)
 * [Peer Identity](#peer-identity)
 * [Repository Identity](#repository-identity)
     * [Validation](#validation)
 * [The Repository Identifier](#the-repository-identifier)
 * [Identity Storage](#identity-storage)
     * [Verification](#verification)
 * [Security](#security)
 * [Closing Thoughts](#closing-thoughts)
 * [Credits](#credits)
 * [Copyright](#copyright)
 Overview
 --------
 To introduce the topic of identity, we point the reader to the opening
 paragraphs of the original specification of identities on the Radicle network,
 which is still very much applicable to the Heartwood protocol:
 > In order to collaborate on repositories within a consensus-free network, we
 > must be able to refer to them using a stable identifier. Note that this
 > identifier is a statement of intent: a repository can be described as a
 > collection of ever-moving leaves of a DAG whose root element is the empty
 > object. Therefore, the content of a repository is not enough to describe it –
 > while two views on the repository may share objects, they may diverge
 > substantially otherwise. Both views may however state their intent to
 > eventually converge to the same state.
 >
 > While in principle a random identifier with sufficient entropy would suffice
 > for the purpose, this would put the burden of deciding which repository views
 > are legitimate entirely on the user. Instead, our approach is to establish an
 > ownership proof, tied to the network identity of a peer, or set of peers,
 > such that repository views can be replicated according to the trust
 > relationships between peers (“tracking”).
 >
 > Our model is loosely based on The Update Framework (TUF)[^0], conceived as a
 > means of securely distributing software packages.
 With this in mind, there are three core components to the Radicle identity
 system, for any given repository:
 . A set of peers on the network, each holding a signing key.
 . A document which establishes the identity of this repository, using these
    signing keys to self-certify.
 . A stable identifier that can be used to refer to the repository, derived
    from this document.
 Peer Identity
 -------------
 Since Radicle repositories on the network are created by peers, we must first
 establish the concept of a *peer identity*. In Heartwood, peers are simply
 identified by their public key. This key is an Ed25519[^1] key that is encoded
 as a DID using the `did:key` method[^2]. DIDs are used for interoperability
 with other systems as well as allowing for other types of identifiers in the
 future.
     did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
 *Example of a peer identifier in DID format.*
 We'll also note that peers on the network -- also called *nodes* are
 indistinguishable from *users* at the protocol level. The terms "Node ID",
 "Peer ID", "Public Key" are thus all used interchangeably.
 Repository Identity
 -------------------
 With the establishment of peer identities, we can now move on to repository
 identities. A repository identity consists of an identity document and an
 associated unique identifier.
 The identity document is a JSON document associated with a repository on Radicle.
 The *hypothetical* minimal identity document looks like this:
     { "delegates": ["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
       "threshold": 1 }
 It describes a repository with a single *delegate*. Delegates are trusted
 entities that can cryptographically sign data within the scope of a given
 repository. In the identity document, they are represented by a DID. As of this
 RIP, only the `did:key` method is supported.
 Using the `threshold` property, the document specifies that only *one* delegate
 is required to sign updates to the repository. In this case, since we only
 have one delegate, this is the only possible value for `threshold`.
 > Repository delegates are responsible for signing all updates to a repository,
 > whether it be source code commits or updates to the identity document itself.
 > They can be thought of as repository "maintainers", though the applicability
 > is broader. We will see how delegates sign repository updates in one of
 > the following sections.
 Though the above document could constitute a valid identity, it does not contain
 any identifiable data that may be used to describe a particular repository.
 This is what the `payload` section is for. Heartwood defines a single payload
 type, `xyz.radicle.project`, which can be used to describe a project stored
 in a repository:
     { "delegates": ["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
       "threshold": 1,
       "payload": {
         "xyz.radicle.project": {
           "name": "heartwood",
           "description": "Radicle Heartwood Protocol & Stack",
           "defaultBranch": "master"
         }
       }
     }
 The string `xyz.radicle.project` is called a *payload ID*, and the `project`
 payload is the default payload type for Radicle repositories. Using this payload,
 type, a repository may be given a name, a description, and a default branch.
 Identity documents are designed to be extensible, and developers may create
 their own payload types and applications can choose which payload types to
 support.
     { ...
       "payload": {
         "xyz.radicle.project": { ... },
         "xyz.radicle.funding": { ... },
         "com.atproto.account": {
           "email": "eve@atproto.com",
           "handle": "eve"
         }
       }
     }
 <small>Figure 1. Fictional example of an identity with multiple payloads</small>
 Payload IDs use reverse domain-name notation[^3] and are comprised of two
 parts: an *authority*, eg. `radicle.xyz`, and a *name*, eg. `project`. To keep
 payload types globally unique, developers wishing to create new payload types
 must control the authority (domain) under which these live.
 As of this RIP, there is only one recognized payload type: `xyz.radicle.project`.
 Repositories which include this type of payload are sometimes called *projects*.
 > When specifying new payload types, it's worth thinking about how the payload
 > schema might evolve over time. For example, it might be worth versioning the
 > payload types, either via the identifier (eg. `com.atproto.account.v1`) or
 > via a field inside the payload (eg. `{"version": 1}`). This will ensure that
 > changes to the payload schema are able to be made in a backwards compatible
 > way.
 ### Validation
 An identity document is valid if the following conditions are met:
 * There is at least *one* `delegate`, but no more than `255`.
 * Strings are not empty, and at most `255` characters long.
 * The `threshold` is not zero and not greater than the number of `delegates`.
 * The items in `delegates` are valid DIDs.
 * There is a `payload` property with at least one payload object and a valid
   payload ID.
 * Each payload under `payload` is valid according to the rules of that payload.
 These rules can be partly described in the following JSON Schema[^4] document:
     {
       "$schema": "https://json-schema.org/draft/2020-12/schema",
       "type": "object",
       "properties": {
         "delegates": {
           "type": "array",
           "items": [{ "type": "string" }],
           "minItems": 1,
           "maxItems": 255,
           "uniqueItems": true
         },
         "threshold": {
           "type": "integer",
           "minimum": 1,
           "maximum": 255
         },
         "payload": {
           "type": "object",
           "additionalProperties": { "type": "object" },
           "minProperties": 1
         }
       },
       "required": ["delegates", "threshold", "payload"]
     }
 Finally, the schema and validation rules for the `xyz.radicle.project` payload
 are described as:
     {
       "$schema": "https://json-schema.org/draft/2020-12/schema",
       "type": "object",
       "properties": {
         "name": {
           "type": "string",
           "minLength": 1,
           "maxLength": 255
         },
         "description": {
           "type": "string",
           "maxLength": 255
         },
         "defaultBranch": {
           "type": "string",
           "minLength": 1,
           "maxLength": 255
         }
       },
       "required": ["name", "description", "defaultBranch"]
     }
 The Repository Identifier
 -------------------------
 Now that we have a document describing our repository, we can derive from it
 a unique identifier that can be used to refer to the repository on the
 peer-to-peer network. This identifier must meet certain criteria:
 . It must be *stable*, in other words it must not change throughout the
    lifetime of the repository.
 . It must be deterministically derivable from the identity document alone, for
    the purpose of verification.
 . It must contain enough entropy to be globally unique.
 . It must allow for easy retrieval of the document from storage.
 To fulfill the above, and given that Radicle uses Git for storage of repository
 data, we choose to use the *Git Object ID*[^5] of the identity document, as
 identifier. Git object IDs, or *OIDs* are "hardened" SHA-1 checksums of their
 content, prefixed with a short header. We can compute this OID using the `git
 hash-object` command. But before doing so, we must take care of one last thing:
 to make the process of hashing our identity document fully deterministic, we
 must first ensure our document is in canonical JSON form[^6]. This prevents
 things like whitespace or key ordering from influencing the document hash and
 therefore the identifier. In turn, this makes the identifier easier to compute
 correctly.
 The above document in canonical form looks like this:
     {"delegates":["did:key:z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi"],
     "payload":{"xyz.radicle.project":{"defaultBranch":"master","description":
     "Radicle Heartwood Protocol & Stack","name":"heartwood"}},"threshold":1}
 We can now compute the Git object ID by placing the above JSON in a file, eg.
 `radicle.json`, taking care to strip all newline characters from it, and
 running the following command:
     $ git hash-object -t blob radicle.json
 The output should be:
     d96f425412c9f8ad5d9a9a05c9831d0728e2338d
 This SHA-1 hash is the document's OID. To turn it into a Radicle repository
 identifer, we encode the underlying 20-byte hash value using `multibase`
 encoding[^7] with the `base-58-btc` alphabet; the same encoding used for the
 `did:key` method, and prefix `rad:` to it, making it a valid URN:
     "rad" ":" multibase(base58-btc, raw-oid-bytes)
 This results in the repository identifier, or RID:
     rad:z42hL2jL4XNk6K8oHQaSWfMgCL7ji
 This RID is theoretically unique thanks to the entropy provided by the delegate
 key and payload.
 Identity Storage
 ----------------
 A storage system suitable for storing identity documents must have two
 properties:
 . It must guarantee data integrity.
 . It must preserve the history of changes to the documents.
 Radicle repositories are stored in Git, and criteria (1) is guaranteed by Git
 natively, so long as we store our identity documents in the Git object database.
 This is because Git hashes all objects under it, and structures its data such
 that a change in hash of one object means a change in hash of all dependent
 objects.
 Criteria (2) can be guaranteed by encoding changes to the documents as a Git
 commit history. Not only does a commit history allow us to preserve all changes,
 it also proves, via hash-linking that no change was omitted from the history.
     Commit             Commit             Commit
     ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
     │ fb8e40a     │◄─┐ │ c581f25     │◄─┐ │ 43eb12d     │
     │             │  └─┤ fb8e40a     │  └─┤ c581f25     │
     │ ┌──────────┐│    │ ┌──────────┐│    │ ┌──────────┐│
     │ │ Document ││    │ │ Document ││    │ │ Document ││
     │ ├──────────┤│    │ ├──────────┤│    │ ├──────────┤│
     └─┴──────────┴┘    └─┴──────────┴┘    └─┴──────────┴┘
 In the above diagram, we see three commits, with the left-most being the *root*
 commit of the document, ie. the initial state, and the right-most being the
 *head*, ie. the latest state.
 Each commit includes within its Git tree, a blob named `radicle.json` which
 contains a version of the identity document.
     ┌───────┬─────────┐                                   ┌─────┬───────────┐
     │commit │ fb8e40a │   ┌─────┬────────────────────┐  ┌►│blob │ d96f425   │
     ├───────┴─────────┤ ┌►│tree │ 82bc09a            │  │ ├─────┴───────────┤
     │tree   82bc09a   ├─┘ ├─────┴────────────────────┤  │ │{"delegates":...,│
     │author ...       │   │blob d96f425 radicle.json ├──┘ │ "payload":...,  │
     │                 │   └──────────────────────────┘    │ "threshold":...}│
     └─────────────────┘                                   └─────────────────┘
 Using this representation, we can hence track all changes to a given identity.
 To keep track of the head of this history, we use a special Git reference
 named `rad/id` which points to the latest version of the identity document:
     fb8e40a ◄─ c581f25 ◄─ 43eb12d ◄─ refs/rad/id
 Just like regular Git branches, when the identity is updated, `refs/rad/id`
 is reset to point to the latest commit. Note that this commit history is
 completely separate from the source code history pointed to by eg.
 `refs/heads/master` and other branches. The identity history is kept in
 the repository's *stored copy*, which is a bare repository, and is not included
 in working copies.
 ### Verification
 Verifying the latest state, ie. commit `43eb12d` is a matter of verifying all
 preceding states, starting from the root (`fb8e40a`).
 The root commit is verifiable intrinsically, since it contains as part of its
 Git tree, the document from which we derived the RID. We can see above that the
 original blob from which we computed the repository identifier is contained in
 the tree pointed to by the root commit of the document history. The root commit
 is hence valid for a given RID if and only if it contains a blob under the name
 `radicle.json` containing a valid identity document which hashes to the given
 RID, *and* the commit is signed by all delegates in the initial `delegates` list.
 Once the root commit is verified, we can proceed to the next commit. Since
 the document may have changed in this commit, the RID is no longer useful for
 verifying this commit. Instead, we make sure that two conditions are fulfilled:
 . The commit containing the updated document is signed by a number of keys
    greater than or equal to the `threshold` property of the *previous*, valid
    version of the document.
 . Each of the aforementioned signatures belongs to a key that is part of the
    `delegates` set of the previous document version.
 > Git commit signatures can be verified with the `git verify-commit` tool.
 These delegate signatures are expected to be included in the commit header
 under the `gpgsig` key, and be encoded in the SSH signature format.
     tree c66cc435f83ed0fba90ed4500e9b4b96e9bd001b
     parent af06ad645133f580a87895353508053c5de60716
     author Buck Mulligan <buck@mulligan.xyz> 1664467633 +0200
     committer Buck Mulligan <buck@mulligan.xyz> 1664786099 -0200
     gpgsig -----BEGIN SSH SIGNATURE-----
      U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgvjrQogRxxLjzzWns8+mKJAGzEX
 fm2ALoN7pyvD2ttQAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
      AAAAQIQvhIewOgGfnXLgR5Qe1ZEr2vjekYXTdOfNWICi6ZiosgfZnIqV0enCPC4arVqQg+
      GPp0HqxaB911OnSAr6bwU=
      -----END SSH SIGNATURE-----
     gpgsig -----BEGIN SSH SIGNATURE-----
      U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgDb3ulFKnHALG8AnuuFPY9prvVZ
      kyLc73tcQ+HG3sCzQAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAABTAAAAC3NzaC1lZDI1NTE5
      AAAAQM9rxErTt7AtcLypSyVM/jmd9/syO4D5hjMjL/9lbGzIkXXDL6+QlUsLipeLuYHV92
      F/6nm/lEaPUTeiZQ5o9AI=
      -----END SSH SIGNATURE-----
 <small>A Git commit header with two SSH signatures.</small>
 We proceed in this manner until the last commit in the history. If all commits
 pass this verification process, we consider the identity valid. Note that every
 version of the document must be validated according to the rules stated under
 the [Validation](#validation) section. This includes the document payload,
 and implies that application developers supporting payload extensions will
 have to provide their own validation for these payloads, that will have to run
 for each commit in the document history.
 It's important to restate that for any commit `C`, other than the root commit,
 verification is done by using the `delegates` and `threshold` values of the
 *parent* commit to `C`, which has already been verified.
 Security
 --------
 The combination of Git storage and cryptographic verification provides very
 strong security and integrity guarantees around Radicle repositories and
 identities:
 * Omitted data up to the latest commit is detected by Git itself
 * Tampering with the identity root will result in a different RID
 * Adding a delegate key without the sign-off of the existing delegate set will
   fail verification
 There is one possible attack that can be carried out by network participants:
 serving old data. Since it isn't possible to know whether a document history
 has a more recent update than the latest known update, a dishonest peer may
 choose to hide the last *N* identity updates from its peers. This means it
 will serve a stale document to its peers.
 However, this attack is only effective if *all* of a victim's connected peers
 perform this censorship. It takes only one honest peer to serve the full
 document history for the censorship to fail.
 Closing Thoughts
 ----------------
 In this RIP we described an identity system for Git repositories that can be
 used to securely distribute code on a peer-to-peer network. The system is
 self-certifying and requires only basic Git primitives to implement.
 Credits
 -------
 * Kim Altintop, for the original design this system is based on
 Copyright
 ---------
 This document is licensed under the Creative Commons CC0 1.0 Universal license.
 [^0]: https://theupdateframework.github.io/specification/latest/
 [^1]: https://ed25519.cr.yp.to/
 [^2]: https://w3c-ccg.github.io/did-method-key/
 [^3]: https://en.wikipedia.org/wiki/Reverse_domain_name_notation
 [^4]: https://json-schema.org/
 [^5]: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
 [^6]: https://datatracker.ietf.org/doc/html/rfc8785
 [^7]: https://w3c-ccg.github.io/multibase/

 ---
 RIP: 3
 Title: Storage Layout
 Author: '@fintohaps <fintan.halpenny@gmail.com>'
 Status: Draft
 Created: 2022-10-27
 License: CC0-1.0
 ---
 The storage layer is a crucial component of the Radicle network, and it is
 designed with a local-first approach. This means that it can accommodate not
 only the local operator's view of a repository, but also the views of peers in
 whom the operator is interested. These views, also known as *forks* or *source
 trees*, play a key role in enabling collaboration and version control within
 the network.
 Table of Contents
 -----------------
 * [Overview](#overview)
 * [Layout](#layout)
 * [Replication](#replication)
 * [Working Copy](#working-copy)
     * [URL](#url)
     * [Refspecs](#refspecs)
     * [Example](#example)
     * [Remote Helper](#remote-helper)
         * [Authorization](#authorization)
 * [Future Work](#future-work)
 * [Appendix](#appendix)
     * [Alternative Designs](#alternative-designs)
         * [Associating a Working Copy](#associating-a-working-copy)
     * [Worked Example](#worked-example)
 * [Credits](#credits)
 * [Copyright](#copyright)
 Overview
 --------
 In a peer-to-peer network, there is no centralized server or repository for
 users to submit their changes. Additionally, the absence of a consensus
 mechanism at the protocol level means that the sequence of operations cannot be
 guaranteed. To tackle these issues, Radicle implements a partitioned approach
 in which each user maintains their own local "fork" of a repository, as well as
 any other forks they have an interest in. These forks are then shared among
 users across the network. This method not only enhances the user experience by
 allowing offline work but also eliminates the need for a server to process
 data. Each repository fork has a single owner and writer, and users are only
 permitted to make changes to their respective forks.
 The storage layer must also be designed for efficient replication of data
 between peers. For this reason, Git is used as the underlying protocol and
 database, as it maps nicely to the type of data exchanged on the Radicle
 network, and is flexible enough for our use case. In addition, Git has been
 optimized for speed and disk space, and will automatically de-duplicate
 repository data and fetch missing objects from peers[^0].
 With the above in mind, this document proposes a storage layer that meets the
 following requirements:
 . The storage layer is capable of maintaining a local copy of the working
    dataset.
 . The storage layer can store any number of repositories.
 . For each repository, it can represent multiple views, or *forks*, of
    the repository.
 . The storage layer can natively interoperate with Git.
 There are two aspects to consider for Git interoperability:
 . Repository replication between peers.
 . Associating a *working* repository or "copy" with a *stored* repository.
 In the next sections we will cover how the above works with the storage layout.
 Layout
 ------
 The storage layout must support multiple repositories and multiple peers per
 repository. Each stored repository is a *bare* Git repository[^1]. To ensure
 uniqueness and easy identification of repositories, a stable and globally
 unique identifier, known as the Repository ID (RID), is assigned to each
 stored repository. The RID for each repository is established according to the
 guidelines provided in RIP#2's section *The Repository Identifier*, and is
 represented as `<rid>` in diagrams found in this document.
 Since our underlying storage uses Git, we represent the storage layout as a
 file tree on the file-system, with `<storage>` representing the storage root,
 or top-level directory under which all repositories are stored on a user's
 device. Though this storage tree is browsable by the user with standard file
 system commands, it is not meant to be interacted with directly by users,
 for risk of corrupting the data. Additionally, Git is free to pack the objects,
 which means they may not always appear as individual files.
     <storage>       # Storage root containing all local repositories
     ├── <rid>       # Some repository, e.g. a project, as a bare git repository
     │   └── refs    # All Git references under this project
     ├── <rid>
     │   └── refs
     ├── <rid>
     │   └── refs
     └── ...
 <small>Basic overview of the storage layout with multiple repositories</small>
 For every repository, each peer associated with that repository must have a
 separate, logical Git source tree -- which contains all the usual reference
 namespaces, i.e. `heads`, `tags`, and `notes`. This *logical repository* is
 what we call *fork* or *view*, and allows peers to maintain different sets of
 changes for the same physical repository.
     <storage>
     └─ <rid>                    # The "physical" Git repository
        └─ refs
           └─ namespaces         # All forks are stored under this namespace
              ├─ <nid>           # One peer's fork is stored here
              │  └─ refs
              ├─ <nid>           # Another peer's fork is stored here
              │  └─ refs
              └─ <nid>           # Etc.
                 └─ refs
 <small>Storage partitioning by Node ID or `<nid>`</small>
 To have this separation, instead of having each peer stored in a separate Git
 repository with a separate object database (ODB), the `gitnamespaces`[^2]
 feature is used. For each peer, including the local peer, their unique
 identifier is used as the namespace within each repository to separate Git
 objects. The identifier used is described in *Peer Identity* in RIP#2, and is
 usually known as the *Node Identifier* (NID):
 > In Heartwood, peers are simply identified by their public key. This
 > key is an Ed25519 key that is encoded as a DID using the `did:key`
 > method. DIDs are used for interoperability with other systems as
 > well as allowing for other types of identifiers in the future.
 Thus, each peer can have its own namespace for references, while sharing the
 objects with other peers via a shared ODB. This ensures only one copy of each
 object is stored across all repository forks.
 The storage uses the encoded public key portion of the `did:key` string as the
 namespace path, denoted as `<nid>` or *Node ID* going forward. This means that
 a peer's references will be scoped by their Node ID via the path prefix
 `refs/namespaces/<nid>`. We demonstrate this organisation below in more detail:
     <storage>                     # Storage root containing all local repositories
     ├─ <rid>                      # Storage for first repository
     │  └─ refs                    # All Git references locally stored
     │     └─ namespaces           # All peer source trees or "forks"
     │        ├─ <nid>             # First node's source tree
     │        │  └─ refs           # First node's Git references
     │        │     ├─ heads       # First node's branches
     │        │     │   └─ master  # First node's master branch
     │        │     ├─ tags        # First node's tags
     │        │     │   ...
     │        │     └─ rad
     │        │         └─ id      # First node's version of the repository identity document
     │        │
     │        └─ <nid>             # Second node's source tree
     │           ├─ refs           # Second node's references
     │           └─ ...
     ├─ <rid>                      # Storage for second repository
     │   ...
     └─ <rid>                      # etc.
         ...
 Note that top-level references may still exist, i.e. `<rid>/refs/{heads,tags}`.
 The top-level namespace must be reserved for canonical references --
 references that are agreed upon collaboratively, as published and stable. They
 do not belong to any one peer and thus may be different on each device. How
 canonical references are decided and written is left for a future RIP.
     <storage>
     └─ <rid>
        └─ refs
           ├─ HEAD                 # Canonical head reference
           ├─ heads                # Canonical branches
           │   └─ master           # Canonical master branch
           ├─ tags
           │   └─ v1.0.0           # Canonical v1.0.0 release tag
           ├─ rad
           │   └─ id               # Canonical identity reference
           └─ namespaces           # All peer source trees
              ├─ <nid>             # First node's source tree
              └─ <nid>             # Second node's source tree
              ...
 <small>Example of canonical references under a repository</small>
 Replication
 -----------
 Repository replication involves retrieving data from a remote peer. As the
 storage consists of Git repositories, data can be transferred remotely using
 the Git protocols[^3] and appropriate refspecs[^4]. However, this document does
 not cover the protocol used or how to verify fetched data, as those topics are
 beyond its scope. They may be discussed in a separate document.
 That being said, we designed the storage layout such that it's easy to transfer
 data between repositories over the network, using an unmodified Git protocol.
 Using refspecs, it's possible to transfer only the objects we're interested in,
 for example we can fetch only a certain peer's fork and not another.
 Working Copy
 ------------
 A working copy is a local copy of a repository, which corresponds to a
 repository in storage. The operator can make changes to the source code in the
 working copy. This is similar to how one would use `git clone` to obtain a copy
 of an upstream repository, such as one hosted on GitHub or GitLab. Once the
 changes have been made in the working copy, they can be pushed upstream. With
 Radicle, changes are fetched and pushed between the *working* copy and the
 *stored* copy within the local storage.
 The connection between the working copy and the storage is maintained through a
 set of Git remotes[^5], where each remote represents a single remote peer or
 *namespace* for that repository and is associated with a Node ID.
 The name of each remote, defined by the operator or application, can be
 customized to suit their preferences. For instance, the operator may use the
 Node ID of the peer, `origin`, `rad`, a nickname, or any other desired name.
 By convention, we use the `rad` remote for the local peer's remote, such that
 a user may push changes to his or her own fork with `git push rad`.
 The URL of each Git remote must resolve the local storage's repository
 corresponding to the working copy. As such, the URL serves as a mapping between
 the working copy and the stored copy.
 ### URL
 The URL scheme for a given Radicle remote is of the form:
     rad://<rid>[/<nid>]
 * The `rad://` scheme is used for Radicle repositories, and identifies a
   project on the network. By using this scheme with Git, the user instructs Git
   to invoke the `git-remote-rad` executable during `git push` or `git fetch`,
   which allows the user to interact with the network through the storage layer.
   This will be covered in more detail in the *Remote Helper* section.
 * The `<rid>` component is the repository identifier to be found in storage.
 * The `<nid>` component is the Node ID which the `--namespace` option will
   be set to. If `<nid>` is not specified, Git will interact with the
   repository's *canonical references*.
 Here's an example URL for repository `z42hL2jL4XNk6K8oHQaSWfMgCL7ji` and peer
 `z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi`:
 	rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji/z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
 Here's a URL for the same repository's canonical references:
 	rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji
 ### Refspecs
 Since Git namespaces are used, the `fetch` refspec[^4] may be:
     +refs/heads/*:refs/remotes/<name>/*
 The operator may also want to scope tags to particular remotes. This
 can be achieved by using the `tagOpt` of a remote and adding another
 fetch refspec.
     fetch = +refs/tags/*:refs/remotes/<name>/tags/*
     tagOpt = --no-tags
 When using these refspecs with `git fetch` or `git push`, it is necessary to
 specify the namespace that is being used for the operation. This can be
 achieved using `git --namespace=<nid>` or `GIT_NAMESPACE=<nid> git`.
 Unfortunately, this is somewhat cumbersome for the user and does not prevent
 pushing to namespaces belonging to a non-local peer. This is remedied in
 [Remote Helper](#Remote-Helper).
 ### Example
 Here's an example remote configuration based on the above specifications:
     [remote "rad"]
         url = rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji/z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
         fetch = +refs/heads/*:refs/remotes/rad/*
 To support fetching canonical references while pushing to the local peer's
 namespace, a configuration like the following can be used:
     [remote "rad"]
         url = rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji
         pushurl = rad://z42hL2jL4XNk6K8oHQaSWfMgCL7ji/z6MknSLrJoTcukLrE435hVNQT4JUhbvWLX4kUzqkEStBU8Vi
         fetch = +refs/heads/*:refs/remotes/rad/*
 In the above configuration, `git pull rad` would pull the canonical references
 while `git push rad` would push to the local user's namespace.
 For a more thorough example, see the [Appendix](#Appendix).
 ### Remote Helper
 The remote helper is what allows Git to interpret URLs with the `rad://`
 scheme.
 As mentioned in the [Working Copy](#Working-Copy) section, there is currently
 no way to configure a Git remote to be aware of additional logic, such as the
 appropriate `refs/namespaces` to use (to avoid having to use `--namespace`) or
 to prevent pushing to other peers' namespaces.
 To address these requirements, a `git-remote-rad` helper binary can be
 introduced to supply the necessary namespace and enforce the correct use of
 peer namespaces.
 `git-remote-rad` is a gitremote-helper[^8] binary. When Git encounters a URL
 that uses the `rad` transport protocol, it delegates the call to
 `git-remote-rad`, which should be found in the operator's path, during a
 `fetch` or `push` operation.
 #### Authorization
 With the remote helper installed, `git push` can automatically set
 `GIT_NAMESPACE` to the Node ID of the current user after verifying that it
 matches the one specified in the URL, and reject pushes to other Node IDs.
 When fetching, the remote helper can set `GIT_NAMESPACE` to whatever Node ID
 is specified in the URL, as no authorization is required to fetch.
 Future Work
 -----------
 You may have noticed that in this [layout](#Layout) the top-level namespace
 is left for canonical references. The definition and verification of canonicity
 is left for a future RIP.
 Appendix
 --------
 ### Alternative Designs
 An alternative design for organizing peer source trees is to use the `remotes`
 namespaces, i.e. `refs/remotes/<nid>`. This particular namespace is deemed
 special by `git` and its tooling. A "remote" reference is one that corresponds
 to a remote location. The remote location and how to fetch/push from/to is
 configured using `git remote`[^6]. When `git fetch` is used for that remote, it
 will place the references under `refs/remotes`[^7].
 #### Associating a Working Copy
 Continuing along this line of enquiry, we look at how this storage will link to
 a working copy -- our personal directory for editing the code. As we previously
 said, we will want to setup a remote in the working copy. This will look like
 the following:
     [remote "alice"]
     url = file:///path/to/storage
     fetch = +refs/remotes/alice/heads/*:refs/remotes/alice/*
 This will do what you expect when running:
     $ git fetch alice
 However, you may be surprised that when running:
     $ git fetch alice master
     fatal: couldn't find remote ref master
 It will not result in fetching the latest changes from `master`. In fact, it
 will say no reference exists. To get the exact `master` we are looking for we
 must run:
     $ git fetch alice refs/remotes/alice/heads/master
 To explain, `git` tends to work under a DWIM (Do What I Mean) principle. The
 `master` in `git fetch alice master` is ambiguous, in general. It could be
 `refs/heads/master`, `refs/remotes/origin/master`,
 `refs/remotes/alice/heads/master`, etc. `git` will assume that what you meant
 was `refs/heads/master` and will look for this on the remote end, but of course
 it does not exist.
 This problem is only compounded with `refs/tags`[^7], where pushing a tag to a
 remote will always DWIM and target the `refs/tags` namespace -- unless
 otherwise specified.
 Thus, we see that this design is not adequate.
 ### Worked Example
 To begin we want to set up three git repositories: `storage`, `project`, and
 `fork`. The `storage` repository will act like the Radicle storage, while
 `project` and `fork` are working copies that will be linked to `storage` via
 their remote entries.
     # Storage setup
     $ mkdir storage
     $ cd storage
     $ git init --bare
     # Project setup
     $ mkdir project
     $ cd project
     $ git init
     # Fork setup
     $ mkdir fork
     $ cd fork
     $ git init
 #### Pushing Changes
 Our first action will be to make changes in `project` and push them to
 `storage`. In order for us to do that we need to create a remote in `project`,
 create a commit, and push it to `storage`.
     # Add remote: "alice" will be used instead of a Node ID
     $ cd project
     $ git remote add alice file:///home/user/radicle/storage
     # Add a commit
     $ touch README.md && git add README.md && git commit -am "Add README"
     $ git --namespace=alice push alice master
 `git` will then print out that it pushed a new branch and we can confirm by
 inspecting the `refs` in `storage`.
     # Inspect refs
     $ cd storage
     $ tree refs
     refs
     ├── heads
     ├── namespaces
     │   └── alice
     │       └── refs
     │           └── heads
     │               └── master
     └── tags
 #### Fetching Changes
 Our next action will be to fetch the changes from `alice` in the `fork`
 repository. To do this, we must add a remote -- like before -- and run a `git
 fetch`.
     # Add remote; alice will mimic the public key hash
     $ cd fork
     $ git remote add alice file:///home/user/radicle/storage
     # Fetch the changes
     $ git --namespace=alice fetch alice
 This will fetch the `heads` from `alice` and put them under the remote `alice`.
 We can confirm this by inspecting the `refs` in `fork`.
     # Inspect refs
     $ tree .git/refs
     .git/refs
     ├── heads
     ├── remotes
     │   └── alice
     │       └── master
     └── tags
 #### Different Peers
 To imitate the reality that there will be a namespace per peer, we add a new
 remote for `fork`. We can then make changes to `alice/master` and publish it
 under the `bob` namespace.
     # Add bob remote
     $ git remote add bob file:///home/user/radicle/storage
     $ git merge bob/master
     $ echo "Hello, Radicle" >> README.md
     $ git commit -am "Hello, Radicle"
     $ git --namespace=bob push bob master
 Again, we can confirm this did what we wanted in `storage`.
     # Inspect storage refs
     cd storage
     tree refs
     refs
     ├── heads
     ├── namespaces
     │   ├── alice
     │   │   └── refs
     │   │       └── heads
     │   │           └── master
     │   └── bob
     │       └── refs
     │           └── heads
     │               └── master
     └── tags
 #### Non-global Tags
 Often we find that pushing tags pollutes the `refs/tags` namespace since they
 do not get placed under `remotes` when fetching. With the use of the
 `gitnamespaces` feature we avoid this.
     $ cd fork
     $ git tag v1.0.0
     $ git push v1.0.0
     # Inspect storage refs
     refs
     ├── heads
     ├── namespaces
     │   ├── alice
     │   │   └── refs
     │   │       └── heads
     │   │           └── master
     │   └── bob
     │       └── refs
     │           ├── heads
     │           │   └── master
     │           └── tags
     │               └── v1.0.0
     └── tags
 This shows that namespaces are superior in organising references correctly for
 each given peer.
 Credits
 -------
 * Kim Altintop, for shining the light on the lesser known `gitnamespaces`[^2]
   feature while developing `radicle-link`.
 * Alex Good, for attempting to implement a feature dubbed "ref rewriting" to
   solve the remotes problem, before realising that using `gitnamespaces`[^2]
   could be a better option.
 Copyright
 ---------
 This document is licensed under the Creative Commons CC0 1.0 Universal license.
 [^0]: https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols
 [^1]: https://git-scm.com/docs/git-init#Documentation/git-init.txt---bare
 [^2]: https://git-scm.com/docs/gitnamespaces
 [^3]: https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols
 [^4]: https://git-scm.com/book/en/v2/Git-Internals-The-Refspec
 [^5]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes
 [^6]: https://git-scm.com/docs/git-remote
 [^7]: https://git-scm.com/book/en/v2/Git-Internals-Git-References
 [^8]: https://git-scm.com/docs/gitremote-helpers

+	`# RIPs`
+
+	`Radicle Improvement Proposals 🌱.`