Jujutsu for everyone

(jj-for-everyone.github.io)

434 points Bogdanp | 2 comments | 31 Aug 25 15:31 UTC | HN request time: 0.404s | source

Show context

Ericson2314 ◴[31 Aug 25 17:11 UTC] No.45084874[source]▶

I want to read "Jutustsu for Git experts"

For example, will the committing of conflicts (a good idea I agree), mess up my existing git rerere?

Also I agree that the staged vs unstaged distinction is stupid and should be abolished, but I do like intentionally staging "the parts of the patch I like" while I work with git add -p. Is there a a lightweight way to have such a 2-patch-deep patch set with JJ that won't involve touching timestamps unnecessarily, causing extra rebuilds with stupid build systems?

replies(3): >>45085487 #>>45088024 #>>45088309 #

mdaniel ◴[31 Aug 25 18:14 UTC] No.45085487[source]▶

>>45084874 #

> Also I agree that the staged vs unstaged distinction is stupid

...

> I do like intentionally staging "the parts of the patch I like" while I work with git add -p

is a mysterious perspective to me. I guess with enough $(git worktree && git diff && vi && git apply) it'd be possible to achieve the staging behavior without formally staging anything but yikes

I just checked and it seems that mercurial 7.1 still doesn't believe in $(hg add -p) so presumably that 'worktree' silliness is the only way to interactively add work in their world

replies(2): >>45088621 #>>45093843 #

sunshowers ◴[01 Sep 25 01:30 UTC] No.45088621[source]▶

>>45085487 #

In Mercurial you'd do hg commit -i and squash further changes down incrementally via hg amend -i, similar to Jujutsu.

(The first thing about Jujutsu that was earth-shattering for me was learning that jj amend is an alias for jj squash. I swore aloud for several minutes when I first learned that.)

replies(1): >>45088687 #

mdaniel ◴[01 Sep 25 01:46 UTC] No.45088687[source]▶

>>45088621 #

How would anyone possibly know that? https://mercurial-scm.org/help/commands/commit says no such thing but I also recognize that I am obviously a fool for thinking that one should add the whole file but only commit part of it

I guess that also places the burden upon the user to .. I dunno, go through that whole TUI dance again?, if one wishes to amend one more line in the file

In some sense, I do recognize it's like showing up to an emacs meeting and bitching about how it doesn't do things the vim way, but seriously, who came up with that mental model for committing only part of the working directory?

replies(1): >>45089053 #

sunshowers ◴[01 Sep 25 03:03 UTC] No.45089053[source]▶

>>45088687 #

Well, git add is super overloaded, because it lets you add untracked files (especially with -N), or all or parts of tracked files to the staging area. Mercurial is a different system with different primitives, where each command tends to do one thing, and add is only meant to operate on untracked files.

I strongly prefer JJ's approach of simply doing away with the concept of untracked files, though note that this is one of the features that is designed around developers having NVMe drives these days. It wouldn't have been possible to scan the working copy with every command back in 2004.

replies(1): >>45093932 #

Ericson2314 ◴[01 Sep 25 16:10 UTC] No.45093932[source]▶

>>45089053 #

It doesn't really depend on NVMe, that's just the OS sucking.

The right way has always been FUSE, so that version control knows about every change as it happens. Push, not pull (or poll).

With FUSE passthrough, maybe this won't even be slow!

replies(1): >>45094158 #

sunshowers ◴[01 Sep 25 16:32 UTC] No.45094158[source]▶

>>45093932 #

> It doesn't really depend on NVMe, that's just the OS sucking.

I've spent so much of my professional career profiling source control access patterns. Hot cache tends to be OS VFS layer performance, but the moment you hit disk that dominates, unless the disk is NVMe (or, back in the day, PCIe flash storage). Further compounding this is the use of a naive LRU cache on some OSes, which means that once the cache size is exceeded, linear scans absolutely destroy performance.

> FUSE

So you might think that, but FUSE turns out to be very hard to do correctly and performantly. I was on the source control team at Facebook, and EdenFS took many years to become stable and performant enough. (It was solving a harder problem though, which was to fetch files lazily.)

I believe Microsoft tried using a FUSE equivalent for the Windows repo for a while, but gave up at some point.

replies(1): >>45103417 #

1. Ericson2314 ◴[02 Sep 25 14:15 UTC] No.45103417[source]▶

>>45094158 #

We're still talking about different things here. I'm saying the entire "VCS scans file system to sync state" is the wrong algorithm. It's unecessary work because there are two sources of truth.

Forgot the constant factors of FUSE, and imagine an in-kernel git implementation. If you have a Merkel CoW filesystem, then when (ignoring journals) you modify child files, you need to update parent directories on disk anyways, this is a great time to recompute VCS hashes too.

"git status" is, if the journal is flushed and hashes are up to date, always an O(1) operation.

replies(1): >>45111070 #

2. sunshowers ◴[03 Sep 25 00:53 UTC] No.45111070[source]▶

>>45103417 (TP) #

You might be interested in how this problem was solved by our team at Meta, in EdenFS (https://github.com/facebook/sapling/blob/main/eden/fs/docs/O...) and Watchman: https://github.com/facebook/watchman.

What you're describing is reasonably similar to EdenFS, except EdenFS runs in userspace.

Watchman layers a consistent view of file metadata on top of inotify (etc), as well as providing stateless queries on top of EdenFS. It acts as a unified interface over regular filesystems as well as Eden that provides file lstat info and hashes over a Unix domain socket.

Back in the day, Watchman sped up status queries by over 5x for a repo with hundreds of thousands of files: https://engineering.fb.com/2014/01/07/core-infra/scaling-mer... I worked directly on this and co-wrote this blog post.

In truth, getting these two components working to the standard expected by developers was a very difficult systems problem with a ton of event ordering and cache invalidation concerns. (With EdenFS, in particular, I believe there was machine learning involved to detect prefetch patterns.) For smaller repos, it is much simpler to do linear scans. Since it is really fast on modern hardware anyway, it is also the right thing to do, following the maxim of doing the simplest thing that works.

↑