←back to thread

600 points antirez | 1 comments | | HN request time: 0s | source
Show context
quantumHazer ◴[] No.44625120[source]
I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).

This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.

The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

replies(6): >>44625135 #>>44626038 #>>44626244 #>>44626345 #>>44632846 #>>44633598 #
antirez ◴[] No.44625135[source]
Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.
replies(1): >>44625422 #
spyckie2 ◴[] No.44625422[source]
Hey antirez,

What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.

Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.

replies(1): >>44627752 #
jacobr1 ◴[] No.44627752[source]
Our team has pretty aggressively started using LLMs for automated code review. It will look at our PRs and post comments. We can adding more material for different things for it to consider- from a looking at a summarized version of our API guidelines, general prompts like, "You are an expert software engineer and QA professional, review this PR and point out any bugs or other areas of technical risk. Make concise suggestions for improvement where applicable." - it catches a ton of stuff.

Another area we've started doing is having it look at build failures and writing a report on suggested root causes before even a human looks at it - saves time.

Or (and we haven't rolled this out automatically yet but are testing a prototype) having it triage alarms from our metrics, with access to the logs and codebase to investigate.

replies(2): >>44630651 #>>44654172 #
infecto ◴[] No.44630651{3}[source]
I have been surprised more folks have no rolled these out as paid for products. I have been getting tons of use out of systems like cursors bugbot. The signal to noise is high and while it’s not always right it catches a lot of bugs I would have missed.
replies(1): >>44632454 #
senko ◴[] No.44632454{4}[source]
There are a few: Greptile, Ellipsis, GH Copilot (integrated with GH)

I feel many also try "review and fix automatically", as it's tempting to "just" pass the generated comments to a second agent to apply them.

But that opens a whole other can of worms and pretty soon you're just another code assistant service.

replies(1): >>44654193 #
dearilos ◴[] No.44654193{5}[source]
if you do specific prompts based on your team's tribal knowledge and standards it works really well

"look at this code for bugs" doesn't end up working well, which is what most code reviewers do.

replies(1): >>44660655 #
1. jacobr1 ◴[] No.44660655{7}[source]
Yep - this is the key to making it work