←back to thread

600 points antirez | 2 comments | | HN request time: 0s | source
Show context
quantumHazer ◴[] No.44625120[source]
I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).

This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.

The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

replies(6): >>44625135 #>>44626038 #>>44626244 #>>44626345 #>>44632846 #>>44633598 #
antirez ◴[] No.44625135[source]
Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.
replies(1): >>44625422 #
spyckie2 ◴[] No.44625422{3}[source]
Hey antirez,

What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.

Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.

replies(1): >>44627752 #
jacobr1 ◴[] No.44627752{4}[source]
Our team has pretty aggressively started using LLMs for automated code review. It will look at our PRs and post comments. We can adding more material for different things for it to consider- from a looking at a summarized version of our API guidelines, general prompts like, "You are an expert software engineer and QA professional, review this PR and point out any bugs or other areas of technical risk. Make concise suggestions for improvement where applicable." - it catches a ton of stuff.

Another area we've started doing is having it look at build failures and writing a report on suggested root causes before even a human looks at it - saves time.

Or (and we haven't rolled this out automatically yet but are testing a prototype) having it triage alarms from our metrics, with access to the logs and codebase to investigate.

replies(2): >>44630651 #>>44654172 #
dearilos ◴[] No.44654172{5}[source]
how are you dealing with the accuracy of the review comments?

in my experience "review this PR" is very generic and ends up giving slop.

replies(1): >>44660673 #
1. jacobr1 ◴[] No.44660673{6}[source]
Yeah - that doesn't work. We include our own guidelines and practices in the context. Basically an ever evolving wiki page of PR best practices. We had that before trying LLMs - so it was easier for us to start. Also we found doing an LLM reformat of that data to might a much tighter set of "rules" helped as well.
replies(1): >>44665530 #
2. dearilos ◴[] No.44665530[source]
ive been building out a directory of code review rules for the last couple of months!

are you open to chatting and sharing notes on what works/doesn't work?

my email is ilya (at) wispbit.com