←back to thread

262 points lawrencechen | 2 comments | | HN request time: 0.001s | source

0github.com is a pull request viewer that color-codes every diff line/token by how much human attention it probably needs. Unlike PR-review bots, we try to flag not just by "is it a bug?" but by "is it worth a second look?" (examples: hard-coded secret, weird crypto mode, gnarly logic, ugly code).

To try it, replace github.com with 0github.com in any pull-request URL. Under the hood, we split the PR into individual files, and for each file, we ask an LLM to annotate each line with a data structure that we parse into a colored heatmap.

Examples:

https://0github.com/manaflow-ai/cmux/pull/666

https://0github.com/stack-auth/stack-auth/pull/988

https://0github.com/tinygrad/tinygrad/pull/12995

https://0github.com/simonw/datasette/pull/2548

Notice how all the example links have a 0 prepended before github.com. This navigates you to our custom diff viewer where we handle the same URL path parameters as github.com. Darker yellows indicate that an area might require more investigation. Hover on the highlights to see the LLM's explanation. There's also a slider on the top left to adjust the "should review" threshold.

Repo (MIT license): https://github.com/manaflow-ai/cmux

Show context
kburman ◴[] No.45763248[source]
It’s an interesting direction, but feels pretty expensive for what might still be a guess at what matters.

I’m not sure an LLM can really capture project-specific context yet from a single PR diff.

Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals.

replies(5): >>45763479 #>>45764303 #>>45765157 #>>45765672 #>>45765995 #
lawrencechen ◴[] No.45763479[source]
Yeah this is honestly pretty expensive to run today.

> I’m not sure an LLM can really capture project-specific context yet from a single PR diff.

We had an even more expensive approach that cloned the repo into a VM and prompted codex to explore the codebase and run code before returning the heatmap data structure. Decided against it for now due to latency and cost, but I think we'll revisit it to help the LLM get project context.

Distillation should help a bit with cost, but I haven't experimented enough to have a definitive answer. Excited to play around with it though!

> which parts of the code change most often or correlate with past bugs

I can think of a way to do the correlation that would require LLMs. Maybe I'm missing a simpler approach? But agree that conditioning on past bugs would be great

replies(2): >>45763902 #>>45765217 #
1. CuriouslyC ◴[] No.45765217[source]
Gemini is better than GPT5 variants for large context. Also, agents tend to be bad at gathering an optimal context set. The best approach is to intelligently select from the codebase to generate a "covering set" of everything touched in the PR, make a bundle, and fire it off at Gemini as a one shot. Because of caching, you can even fire off multiple queries to Gemini instructing it to evaluate the PR from different perspectives for cheap.
replies(1): >>45765821 #
2. lawrencechen ◴[] No.45765821[source]
Yeah, adding a context gathering step is a good idea. Our original approach used codex cli in a VM, so context gathering was pretty comprehensive. We switched to a more naive approach due to latency, but having a step using a smaller model (like SWE-grep) could be a nice tradeoff.