(0github.com)

262 points lawrencechen | 2 comments | 30 Oct 25 14:21 UTC | HN request time: 0.001s | source

0github.com is a pull request viewer that color-codes every diff line/token by how much human attention it probably needs. Unlike PR-review bots, we try to flag not just by "is it a bug?" but by "is it worth a second look?" (examples: hard-coded secret, weird crypto mode, gnarly logic, ugly code).

To try it, replace github.com with 0github.com in any pull-request URL. Under the hood, we split the PR into individual files, and for each file, we ask an LLM to annotate each line with a data structure that we parse into a colored heatmap.

Examples:

https://0github.com/manaflow-ai/cmux/pull/666

https://0github.com/stack-auth/stack-auth/pull/988

https://0github.com/tinygrad/tinygrad/pull/12995

https://0github.com/simonw/datasette/pull/2548

Notice how all the example links have a 0 prepended before github.com. This navigates you to our custom diff viewer where we handle the same URL path parameters as github.com. Darker yellows indicate that an area might require more investigation. Hover on the highlights to see the LLM's explanation. There's also a slider on the top left to adjust the "should review" threshold.

Repo (MIT license): https://github.com/manaflow-ai/cmux

Show context

kburman ◴[30 Oct 25 18:17 UTC] No.45763248[source]▶

>>45760321 (OP) #

It’s an interesting direction, but feels pretty expensive for what might still be a guess at what matters.

I’m not sure an LLM can really capture project-specific context yet from a single PR diff.

Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals.

replies(5): >>45763479 #>>45764303 #>>45765157 #>>45765672 #>>45765995 #

lawrencechen ◴[30 Oct 25 18:31 UTC] No.45763479[source]▶

>>45763248 #

Yeah this is honestly pretty expensive to run today.

> I’m not sure an LLM can really capture project-specific context yet from a single PR diff.

We had an even more expensive approach that cloned the repo into a VM and prompted codex to explore the codebase and run code before returning the heatmap data structure. Decided against it for now due to latency and cost, but I think we'll revisit it to help the LLM get project context.

Distillation should help a bit with cost, but I haven't experimented enough to have a definitive answer. Excited to play around with it though!

> which parts of the code change most often or correlate with past bugs

I can think of a way to do the correlation that would require LLMs. Maybe I'm missing a simpler approach? But agree that conditioning on past bugs would be great

replies(2): >>45763902 #>>45765217 #

1. CuriouslyC ◴[30 Oct 25 20:54 UTC] No.45765217[source]▶

>>45763479 #

Gemini is better than GPT5 variants for large context. Also, agents tend to be bad at gathering an optimal context set. The best approach is to intelligently select from the codebase to generate a "covering set" of everything touched in the PR, make a bundle, and fire it off at Gemini as a one shot. Because of caching, you can even fire off multiple queries to Gemini instructing it to evaluate the PR from different perspectives for cheap.

replies(1): >>45765821 #

2. lawrencechen ◴[30 Oct 25 21:52 UTC] No.45765821[source]▶

>>45765217 (TP) #

Yeah, adding a context gathering step is a good idea. Our original approach used codex cli in a VM, so context gathering was pretty comprehensive. We switched to a more naive approach due to latency, but having a step using a smaller model (like SWE-grep) could be a nice tradeoff.

↑

Show HN: I made a heatmap diff viewer for code reviews