Coding with LLMs in the summer of 2025 – an update

(antirez.com)

600 points antirez | 2 comments | 20 Jul 25 11:04 UTC | HN request time: 0s | source

Show context

quantumHazer ◴[20 Jul 25 13:42 UTC] No.44625120[source]▶

I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).

This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.

The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

replies(6): >>44625135 #>>44626038 #>>44626244 #>>44626345 #>>44632846 #>>44633598 #

antirez ◴[20 Jul 25 13:44 UTC] No.44625135[source]▶

>>44625120 #

Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.

replies(1): >>44625422 #

spyckie2 ◴[20 Jul 25 14:17 UTC] No.44625422{3}[source]▶

>>44625135 #

Hey antirez,

What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.

Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.

replies(1): >>44627752 #

jacobr1 ◴[20 Jul 25 18:13 UTC] No.44627752{4}[source]▶

>>44625422 #

Our team has pretty aggressively started using LLMs for automated code review. It will look at our PRs and post comments. We can adding more material for different things for it to consider- from a looking at a summarized version of our API guidelines, general prompts like, "You are an expert software engineer and QA professional, review this PR and point out any bugs or other areas of technical risk. Make concise suggestions for improvement where applicable." - it catches a ton of stuff.

Another area we've started doing is having it look at build failures and writing a report on suggested root causes before even a human looks at it - saves time.

Or (and we haven't rolled this out automatically yet but are testing a prototype) having it triage alarms from our metrics, with access to the logs and codebase to investigate.

replies(2): >>44630651 #>>44654172 #

dearilos ◴[22 Jul 25 23:27 UTC] No.44654172{5}[source]▶

>>44627752 #

how are you dealing with the accuracy of the review comments?

in my experience "review this PR" is very generic and ends up giving slop.

replies(1): >>44660673 #

1. jacobr1 ◴[23 Jul 25 15:59 UTC] No.44660673{6}[source]▶

>>44654172 #

Yeah - that doesn't work. We include our own guidelines and practices in the context. Basically an ever evolving wiki page of PR best practices. We had that before trying LLMs - so it was easier for us to start. Also we found doing an LLM reformat of that data to might a much tighter set of "rules" helped as well.

replies(1): >>44665530 #

2. dearilos ◴[24 Jul 25 00:22 UTC] No.44665530[source]▶

>>44660673 (TP) #

ive been building out a directory of code review rules for the last couple of months!

are you open to chatting and sharing notes on what works/doesn't work?

my email is ilya (at) wispbit.com

↑