radicle: improve inline comments — heartwood

rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5

Radicle Heartwood Protocol & Stack

radicle: improve inline comments

Draft fintohaps opened 1 year ago

The previous CodeLocation type did not correctly represent the semantics of an inline code location, for the following reasons:

If it is for lines within a file pertaining to the commit, then there should be no old or new ranges, only a single range.
If it is for a patch created by diffing the commit with another commit, then the other commit is missing.

It could be argued that the semantics of these two cases could be retrieved by:

In the first case, just use one of the ranges.
In the second case, infer the commit from the context, e.g. always use Revision::base for locations used in the ReviewComment.

Instead, the type is renamed to PartialLocation to represent these semantics better, and only remains for backwards-compatibility when deserializing existing data.

Two new types are introduced CommitLocation and DiffLocation, which more accurately describe the two cases.

Each of these is used in a Patch’s Revision and Review thread respectively. These are created with new actions RevisionCommitComment and ReviewDiffComment, while deprecating the use of RevisionComment and ReviewComment – they can no longer be constructed outside of the module.

The final part is to ensure that clients can calculate stable diffs for patches – to allow them to display the DiffLocations correctly – the relevant diff options are recorded for each patch. Sane defaults are used if the options are missing.

Activity Changes +0 -0

fintohaps opened with revision e500399b on base 3b5fac17 +691 -237 1 year ago

The previous CodeLocation type did not correctly represent the semantics of an inline code location, for the following reasons:

If it is for lines within a file pertaining to the commit, then there should be no old or new ranges, only a single range.
If it is for a patch created by diffing the commit with another commit, then the other commit is missing.

It could be argued that the semantics of these two cases could be retrieved by:

In the first case, just use one of the ranges.
In the second case, infer the commit from the context, e.g. always use Revision::base for locations used in the ReviewComment.

Instead, the type is renamed to PartialLocation to represent these semantics better, and only remains for backwards-compatibility when deserializing existing data.

Two new types are introduced CommitLocation and DiffLocation, which more accurately describe the two cases.

rudolfs commented on revision 1 1 year ago

I need a bit of clarification, because to me the first point doesn’t entirely make sense. Perhaps my mental model of this isn’t quite right, and I just need some help to set it straight.

In most forges known to me, we never comment directly on the data (i.e. files) within a commit, we always comment on the changes with regards to the parent commit, i.e. the git diff <old>..<new> as opposed to git ls-tree -r <new> and then looking at the files with git cat-file -p <SHA>.

When there’s a diff between two commits we always have either:

a deletion or
an addition or
a modification (deletion and addition on the same line).

In the case of a split diff representation we have a left (deletion) and right (addition) side. My understanding is that to accurately represent a selection, we have to provide both the code range for the left side and a code range for the right side (because the line numbers between the sides differ). Therefore even if it’s just a single commit we would need both the old and new ranges.

In other words I’m arguing that we should keep the old and new ranges, but always assume that it is a diff between two points in time, i.e. there are always two commits from which we generate a diff.

There is still some ambiguity around how to interpret the new and old ranges if they actually don’t overlap, or overlap only partially, but in general clients should strive to generate ranges for both old and new that overlap fully.

rudolfs commented on revision 1 1 year ago

An alternative way to represent line ranges in diffs would be to only have the start and end markers and inlcude the side for both, like so: start: L5, end: R10. This representation would be more compact and easier to handle on the front-end. It would also discourage selections that are impossible, i.e. when there’s no overlap for the selection of the left and right hand sides of a split diff.

A good way to visualize this is to picture a spreadsheet with two columns: left (old) and right (new). You can click on any cell (line number + side) and drag to any other cell to make a selection. Here are some example screenshots of selections represented this way.

fintohaps commented on revision 1 1 year ago

In most forges known to me, we never comment directly on the data (i.e. files) within a commit, we always comment on the changes with regards to the parent commit, i.e. the git diff .. as opposed to git diff ...

Well, this isn’t entirely true, imo :) I think it’s useful to point to a pinned location to talk about code, so a CommitLocation is pinning the location of a commit and pointing out some lines of code in that location. Sure, if it’s a review that’s not as common, but I’m introducing the concept of being able to talk about a certain location in a commit.

So I’m arguing that we should keep the old and new ranges, but always assume that it is a diff between two points in time, i.e. there’s always a parent commit and a commit from which we generate a diff.

Ok, so you’re saying that you’d prefer use line numbers over hunk index and hunk line numbers? So DiffLocation would use the old: Range, new: Range fields instead?

I’m ok with this after looking through the comment porting stuff, so happy to be persuaded to do it that way.

There is still some ambiguity around how to interpret the new and old ranges if they actually don’t overlap, or overlap only partially, but in general clients should strive to generate ranges for both old and new that overlap fully.

Ya, I think it’s always a possibility that you can construct a selection that doesn’t make sense – but the assumption is that if you’re working within a hunk, then you’re going to construct a correct selection. We can make things more type safe by using NonZeroUsize as a representation of line numbers too.

fintohaps commented on revision 1 1 year ago

Why do the context lines only use L in the screenshots?

It would also discourage selections that are impossible, i.e. when there’s no overlap for the selection of the left and right hand sides of a split diff.

Mmmmm, I’m not sure this entirely true. In theory, I could do start: L5, end: R10, and R10 is actually a deletion, which wouldn’t make any sense – you cannot guarantee the range is correct unless you consult the hunk in question.

That being said, it could be a quick way of constructing the code ranges.

fintohaps pushed revision 2 687497ce on base 3b5fac17 +908 -228 1 year ago

Changes:

Split up the patch into separate commits
Optional selection for file-level comments
Rename actions to use V1 suffix, keeping the latest version of actions using their regular names. As well as this, the enum tag renames use .v2 suffix.
Refactor diff::Options methods to return git2::DiffOptions directly.
Add FindOptions to the diff::Options.

fintohaps pushed revision 3 b168b283 on base 3b5fac17 +891 -228 1 year ago

Changes:

Remove all from FindOptions – I found that it is worse at detecting renames
Remove exact_match, because I believe this will only look at SHAs and not return renames

did:key:z6MkgFq6...nBGz reviewed · 4 comments 1 year ago

fintohaps pushed revision 4 0a74e306 on base 5553a147 +924 -244 11 months ago

Changes:

Rebased
Document code
Removed CodeRange::Chars – I believe that no one actually used this and I’d prefer not support it until it’s needed.

fintohaps pushed revision 5 8ac7f355 on base 2a47bc0c +2394 -538 9 months ago

Backwards-compatible Approach

This approach extends the way we implemented changes to Review in a previous patch. The change is two-fold.

The first thing we do is separate out the effected Action variants into the actions module. While doing this, we version them, and guard the way they are constructed and how they affect a Patch when being applied. We also ensure that we can deserialize.

The second part is to separate the “encoding” of a Revision. Since Revision is changing by changing the type of the code locations from PartialLocation (previously named CodeLocation) to DiffLocation. The encoding::Revision type can deserialize any existing revision JSON that exists, and it does this be “decoding” the previous version of the data into the newer version. Note that to do this effectively, we need to provide the DiffLocation with the base: Oid. We, safely, assume that this is the Revision::base and use that as the contextual information for building the DiffLocation. Note that this means encoding::Review** will no longer implement Deserialize**, but it does have a method decode**. This method takes the contextual base: Oid** to convert any found PartialLocations into DiffLocations.

NOTE: when converting a PartialLocation to a DiffLocation, there is no way to get the hunk selection from the line selection. This would require a Repository handle to compute. We set the selection to None as a minor sacrifice in the name of backwards-compatibility (almost).

Aside: One can imagine that we might end up changing some field in Patch in the future, and this exact process will have to be done again and introduce an encoding::Patch, which provides a way to migrate from the previous encoding to the latest encoding.

did:key:z6MkgFq6...nBGz pushed revision 6 62142e99 on base 2a47bc0c +2394 -535 9 months ago

Properly encode single-line review comments in the CLI

lorenz pushed revision 7 2331a68d on base a8dac56c +2393 -534 8 months ago

Rebase

fintohaps pushed revision 8 8eaf54d4 on base de78cf78 +2394 -535 8 months ago

Changes:

Rebased
Fixed SHA in workflow test
Removed old radicle/cob/patch.rs that was left in by accident