The previous CodeLocation type did not correctly represent the semantics of an
inline code location, for the following reasons:
- If it is for lines within a file pertaining to the commit, then there should
be no
oldornewranges, only a single range. - If it is for a patch created by diffing the commit with another commit, then the other commit is missing.
It could be argued that the semantics of these two cases could be retrieved by:
- In the first case, just use one of the ranges.
- In the second case, infer the commit from the context, e.g. always use
Revision::basefor locations used in theReviewComment.
Instead, the type is renamed to PartialLocation to represent these semantics
better, and only remains for backwards-compatibility when deserializing existing
data.
Two new types are introduced CommitLocation and DiffLocation, which more
accurately describe the two cases.
Each of these is used in a Patch’s Revision and Review thread
respectively. These are created with new actions RevisionCommitComment and
ReviewDiffComment, while deprecating the use of RevisionComment and
ReviewComment – they can no longer be constructed outside of the module.
The final part is to ensure that clients can calculate stable diffs for patches
– to allow them to display the DiffLocations correctly – the relevant diff
options are recorded for each patch. Sane defaults are used if the options are
missing.
The previous CodeLocation type did not correctly represent the semantics of an
inline code location, for the following reasons:
- If it is for lines within a file pertaining to the commit, then there should
be no
oldornewranges, only a single range. - If it is for a patch created by diffing the commit with another commit, then the other commit is missing.
It could be argued that the semantics of these two cases could be retrieved by:
- In the first case, just use one of the ranges.
- In the second case, infer the commit from the context, e.g. always use
Revision::basefor locations used in theReviewComment.
Instead, the type is renamed to PartialLocation to represent these semantics
better, and only remains for backwards-compatibility when deserializing existing
data.
Two new types are introduced CommitLocation and DiffLocation, which more
accurately describe the two cases.
Each of these is used in a Patch’s Revision and Review thread
respectively. These are created with new actions RevisionCommitComment and
ReviewDiffComment, while deprecating the use of RevisionComment and
ReviewComment – they can no longer be constructed outside of the module.
The final part is to ensure that clients can calculate stable diffs for patches
– to allow them to display the DiffLocations correctly – the relevant diff
options are recorded for each patch. Sane defaults are used if the options are
missing.
I need a bit of clarification, because to me the first point doesn’t entirely make sense. Perhaps my mental model of this isn’t quite right, and I just need some help to set it straight.
In most forges known to me, we never comment directly on the data (i.e. files) within a commit, we always comment on the changes with regards to the parent commit, i.e. the git diff <old>..<new> as opposed to git ls-tree -r <new> and then looking at the files with git cat-file -p <SHA>.
When there’s a diff between two commits we always have either:
- a deletion or
- an addition or
- a modification (deletion and addition on the same line).
In the case of a split diff representation we have a left (deletion) and right (addition) side. My understanding is that to accurately represent a selection, we have to provide both the code range for the left side and a code range for the right side (because the line numbers between the sides differ). Therefore even if it’s just a single commit we would need both the old and new ranges.
In other words I’m arguing that we should keep the old and new ranges, but always assume that it is a diff between two points in time, i.e. there are always two commits from which we generate a diff.
There is still some ambiguity around how to interpret the new and old ranges if they actually don’t overlap, or overlap only partially, but in general clients should strive to generate ranges for both old and new that overlap fully.
An alternative way to represent line ranges in diffs would be to only have the start and end markers and inlcude the side for both, like so: start: L5, end: R10. This representation would be more compact and easier to handle on the front-end. It would also discourage selections that are impossible, i.e. when there’s no overlap for the selection of the left and right hand sides of a split diff.
A good way to visualize this is to picture a spreadsheet with two columns: left (old) and right (new). You can click on any cell (line number + side) and drag to any other cell to make a selection. Here are some example screenshots of selections represented this way.
In most forges known to me, we never comment directly on the data (i.e. files) within a commit, we always comment on the changes with regards to the parent commit, i.e. the git diff
.. as opposed to git diff .. .
Well, this isn’t entirely true, imo :) I think it’s useful to point to a pinned location to talk about code, so a CommitLocation is pinning the location of a commit and pointing out some lines of code in that location. Sure, if it’s a review that’s not as common, but I’m introducing the concept of being able to talk about a certain location in a commit.
So I’m arguing that we should keep the old and new ranges, but always assume that it is a diff between two points in time, i.e. there’s always a parent commit and a commit from which we generate a diff.
Ok, so you’re saying that you’d prefer use line numbers over hunk index and hunk line numbers? So DiffLocation would use the old: Range, new: Range fields instead?
I’m ok with this after looking through the comment porting stuff, so happy to be persuaded to do it that way.
There is still some ambiguity around how to interpret the new and old ranges if they actually don’t overlap, or overlap only partially, but in general clients should strive to generate ranges for both old and new that overlap fully.
Ya, I think it’s always a possibility that you can construct a selection that doesn’t make sense – but the assumption is that if you’re working within a hunk, then you’re going to construct a correct selection. We can make things more type safe by using NonZeroUsize as a representation of line numbers too.
Why do the context lines only use L in the screenshots?
It would also discourage selections that are impossible, i.e. when there’s no overlap for the selection of the left and right hand sides of a split diff.
Mmmmm, I’m not sure this entirely true. In theory, I could do start: L5, end: R10, and R10 is actually a deletion, which wouldn’t make any sense – you cannot guarantee the range is correct unless you consult the hunk in question.
That being said, it could be a quick way of constructing the code ranges.
Changes:
- Split up the patch into separate commits
- Optional selection for file-level comments
- Rename actions to use
V1suffix, keeping the latest version of actions using their regular names. As well as this, the enum tag renames use.v2suffix. - Refactor
diff::Optionsmethods to returngit2::DiffOptionsdirectly. - Add
FindOptionsto thediff::Options.
Changes:
- Remove
allfromFindOptions– I found that it is worse at detecting renames - Remove
exact_match, because I believe this will only look at SHAs and not return renames
Changes:
- Rebased
- Document code
- Removed
CodeRange::Chars– I believe that no one actually used this and I’d prefer not support it until it’s needed.
Backwards-compatible Approach
This approach extends the way we implemented changes to Review in a previous
patch. The change is two-fold.
The first thing we do is separate out the effected Action variants into the
actions module. While doing this, we version them, and guard the way they are
constructed and how they affect a Patch when being applied. We also ensure
that we can deserialize.
The second part is to separate the “encoding” of a Revision. Since Revision
is changing by changing the type of the code locations from PartialLocation
(previously named CodeLocation) to DiffLocation. The encoding::Revision
type can deserialize any existing revision JSON that exists, and it does this be
“decoding” the previous version of the data into the newer version. Note that to
do this effectively, we need to provide the DiffLocation with the base: Oid.
We, safely, assume that this is the Revision::base and use that as the
contextual information for building the DiffLocation. Note that this means
encoding::Review** will no longer implement Deserialize**, but it does have a
method decode**. This method takes the contextual base: Oid** to convert any
found PartialLocations into DiffLocations.
NOTE: when converting a PartialLocation to a DiffLocation, there is no
way to get the hunk selection from the line selection. This would require a
Repository handle to compute. We set the selection to None as a minor
sacrifice in the name of backwards-compatibility (almost).
Aside: One can imagine that we might end up changing some field in Patch
in the future, and this exact process will have to be done again and introduce
an encoding::Patch, which provides a way to migrate from the previous encoding
to the latest encoding.
Properly encode single-line review comments in the CLI
Rebase
Changes:
- Rebased
- Fixed SHA in workflow test
- Removed old radicle/cob/patch.rs that was left in by accident