←back to thread

140 points Tomte | 8 comments | | HN request time: 0.633s | source | bottom
1. gnuvince ◴[] No.26288322[source]
Hijacking this topic to talk about something I've been thinking about lately: literate diffs.

I find that the order of diffs given by git is not optimized for helping a reviewer understand the change. Sometimes the order of files will not be in the most logical way; sometimes unrelated changes (e.g., a text editor removing blanks at the end of lines) create noise; etc.

I've been thinking that it would be interesting to have a tool where the author can take the diff of their commit(s), order them in a way that is conducive to understanding and explain each part of the diff. That'd be similar to having the author do a code walkthrough, but at the pace of the reader rather than the author.

replies(6): >>26288537 #>>26288793 #>>26288837 #>>26289067 #>>26289125 #>>26289821 #
2. jonahbenton ◴[] No.26288537[source]
Take a look at the term "Semantic Source Diff", eg

https://martinfowler.com/bliki/SemanticDiff.html

Tools in this space date back to the 1990s. There is a recent upsurge of interest, a number of capable tools for different languages are currently available.

replies(1): >>26288818 #
3. mikepurvis ◴[] No.26288793[source]
Love it. Currently there's a gap where the diff is generated by your review platform, but it would be amazing if there was a way to submit your annotated/ordered diff and the platform would use it as the review starting point, provided it passed validation in terms of actually being a representative and equivalent diff.
4. briv ◴[] No.26288818[source]
Pijul gives me hope semantic diffs may become common - see the "Dependencies" paragraph of https://pijul.org/posts/2020-11-07-towards-1.0/. The HN comments on that - https://news.ycombinator.com/item?id=25032956 - are a nice read as well.
5. memco ◴[] No.26288837[source]
What you’re describing is already possible with Git: rebase and committing chunk/lines allows you to organize your changes coherently. The trick is finding ways to get into the habit of doing it that way and staying consistent with the whole team.

Edit: i’m not saying that this is a solved problem. I think the parent’s point is valid. I am just saying that there are some tools that make this possible and I agree that there is a definite need for improvements in this area.

6. geofft ◴[] No.26289067[source]
I would love if my VCS tool could also keep track of things like "This diff is the result of running sed -i s/this/that/g .py". I usually split out such mechanical changes anyway into a separate commit, but it would be clearer for reviewers to see that (most review tools show you the overall diff of the entire branch you want to merge by default, making you click further to see patch-by-patch changes), and it would also be easier for me* if the VCS could re-run the sed command when I rebased.

(An obvious next step is Coccinelle-style semantic patches, but let's start with sed!)

7. jedimastert ◴[] No.26289125[source]
I believe most literate programming tools are language-agnostic, so you could probably do that with this tool!
8. shakna ◴[] No.26289821[source]
If you're making use of something like git-send-email you can already do this easily.

The patch format explicitly allows it to ignore "junk" information at certain points, so you can edit in comments all over the place. The format also lets you break up a diff, rearranging it semantically, and it'll get rebuilt later.

Edit, to expand on the above:

> patch tries to skip any leading garbage, apply the diff, and then skip any trailing garbage. Thus you could feed an article or message containing a diff listing to patch, and it should work..... After removing indenting or encapsulation, lines beginning with # are ignored, as they are considered to be comments.

> With context diffs, and to a lesser extent with normal diffs, patch can detect when the line numbers mentioned in the patch are incorrect, and attempts to find the correct place to apply each hunk of the patch. As a first guess, it takes the line number mentioned for the hunk, plus or minus any offset used in applying the previous hunk. If that is not the correct place, patch scans both forwards and backwards for a set of lines matching the context given in the hunk.