Coding with LLMs in the summer of 2025 – an update

1. quantumHazer ◴[20 Jul 25 13:42 UTC] No.44625120[source]▶

I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).

This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.

The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

replies(6): >>44625135 #>>44626038 #>>44626244 #>>44626345 #>>44632846 #>>44633598 #

2. antirez ◴[20 Jul 25 13:44 UTC] No.44625135[source]▶

>>44625120 (TP) #

Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.

replies(1): >>44625422 #

3. spyckie2 ◴[20 Jul 25 14:17 UTC] No.44625422[source]▶

>>44625135 #

Hey antirez,

What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.

Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.

replies(1): >>44627752 #

4. kgwgk ◴[20 Jul 25 15:23 UTC] No.44626038[source]▶

>>44625120 (TP) #

> The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.

It’s not like once you have a PhD anyone cares about the subject, right? The only thing that matters is that you learnt to conduct research.

replies(1): >>44626106 #

5. quantumHazer ◴[20 Jul 25 15:29 UTC] No.44626106[source]▶

>>44626038 #

I can't understand why once you have a PhD anyone should care more about the subject.

6. ghm2180 ◴[20 Jul 25 15:39 UTC] No.44626244[source]▶

>>44625120 (TP) #

> but rather to learn how to conduct research

Further, I always assumed PhD level of knowledge meant coming up with the right questions. I would say it is at best a "Lazy Knowledge Rich worker", it won't explore hypothesis if you don't *ask it* to. A PHD would ask those questions to *themselves*. Let me give you a simple example:

The other day Claude Code(Max Pro Subscription) commented out a bunch of test assertions as a part of a related but separate test suite it was coding. It did not care to explore — what was a serious bug — why it was commenting it out because of a faulty assumption in the original plan. I had to ask it to change the plan by doing the ultra-think, think-hard trick to explore why it was failing, amend the plan and fix it.

The bug was the ORM object had null values because it was not refreshed after the commit and was fetched before by another DB session that had since been closed.*

replies(1): >>44631306 #

7. chis ◴[20 Jul 25 15:50 UTC] No.44626345[source]▶

>>44625120 (TP) #

If you understand that a PhD is about much more than just knowledge, it's still the case that having easy access to that knowledge is super valuable. My last job we often had questions that would just traditionally require a PhD-level person to answer, even if it wasn't at the limit of their research abilities. "What will happen to the interface of two materials if voltage is applied in one direction" type stuff, turns out to be really hard to answer but LLMs do a decent job.

replies(1): >>44626398 #

8. quantumHazer ◴[20 Jul 25 15:55 UTC] No.44626398[source]▶

>>44626345 #

Have you checked experimentally the response of the LLM?

Anyway I don't think this is ""PhD-knowledge"" questions, but job related electrical engineering questions.

9. jacobr1 ◴[20 Jul 25 18:13 UTC] No.44627752{3}[source]▶

>>44625422 #

Our team has pretty aggressively started using LLMs for automated code review. It will look at our PRs and post comments. We can adding more material for different things for it to consider- from a looking at a summarized version of our API guidelines, general prompts like, "You are an expert software engineer and QA professional, review this PR and point out any bugs or other areas of technical risk. Make concise suggestions for improvement where applicable." - it catches a ton of stuff.

Another area we've started doing is having it look at build failures and writing a report on suggested root causes before even a human looks at it - saves time.

Or (and we haven't rolled this out automatically yet but are testing a prototype) having it triage alarms from our metrics, with access to the logs and codebase to investigate.

replies(2): >>44630651 #>>44654172 #

10. infecto ◴[21 Jul 25 00:17 UTC] No.44630651{4}[source]▶

>>44627752 #

I have been surprised more folks have no rolled these out as paid for products. I have been getting tons of use out of systems like cursors bugbot. The signal to noise is high and while it’s not always right it catches a lot of bugs I would have missed.

replies(1): >>44632454 #

11. vl ◴[21 Jul 25 02:37 UTC] No.44631306[source]▶

>>44626244 #

It's ultrathink one word, not ultra-think. (See below).

I use Claude Code with Opus, and had same experience - was pushing it hard to implement complex test, and it gave me an empty test function with test plan inside in a comment (lol).

I do want to try Gemini 2.5 Pro, but I don't know a tool which would make experience compatible to Claude Code. Would it make sense to use with Cursor? Do they try to limit context?

  ~/.nvm/versions/node/v22.16.0/lib/node_modules/@anthropic-ai/claude-code $ npx prettier cli.js | ack ultrathink -C 20
  var jw1 = { HIGHEST: 31999, MIDDLE: 1e4, BASIC: 4000, NONE: 0 },
  Yk6 = {
    english: {
      HIGHEST: [
        { pattern: "think harder", needsWordBoundary: !0 },
        { pattern: "think intensely", needsWordBoundary: !0 },
        { pattern: "think longer", needsWordBoundary: !0 },
        { pattern: "think really hard", needsWordBoundary: !0 },
        { pattern: "think super hard", needsWordBoundary: !0 },
        { pattern: "think very hard", needsWordBoundary: !0 },
        { pattern: "ultrathink", needsWordBoundary: !0 },
      ],
      MIDDLE: [
        { pattern: "think about it", needsWordBoundary: !0 },
        { pattern: "think a lot", needsWordBoundary: !0 },
        { pattern: "think deeply", needsWordBoundary: !0 },
        { pattern: "think hard", needsWordBoundary: !0 },
        { pattern: "think more", needsWordBoundary: !0 },
        { pattern: "megathink", needsWordBoundary: !0 },
      ],
      BASIC: [{ pattern: "think", needsWordBoundary: !0 }],
      NONE: [],
    },

replies(2): >>44631521 #>>44632754 #

12. andrew_k ◴[21 Jul 25 03:27 UTC] No.44631521{3}[source]▶

>>44631306 #

Google has gemini-cli that is pretty close to Claude Code in terms of experience https://github.com/google-gemini/gemini-cli and has a generous free tier. Claude Code is still superior in my experience, Gemini CLI can go off-course pretty quickly if you accept auto edits. But it is handy for code reviews and planning with it's large context window.

13. senko ◴[21 Jul 25 06:58 UTC] No.44632454{5}[source]▶

>>44630651 #

There are a few: Greptile, Ellipsis, GH Copilot (integrated with GH)

I feel many also try "review and fix automatically", as it's tempting to "just" pass the generated comments to a second agent to apply them.

But that opens a whole other can of worms and pretty soon you're just another code assistant service.

replies(1): >>44654193 #

14. elyase ◴[21 Jul 25 07:55 UTC] No.44632754{3}[source]▶

>>44631306 #

https://github.com/sst/opencode

15. ramraj07 ◴[21 Jul 25 08:12 UTC] No.44632846[source]▶

>>44625120 (TP) #

Except during the data science craze of 2015s, there was never a situation that you could just have a phd in any field and get any "phd level job", so whatever pedantic idea you have of what phds learn, not a single person who's hiring phds agrees with you. On the contrary, even most phd professors treat you as only a vessel of the very specific topic you studied during your phd. Go try to get a postdoc in a top lab when your PhD was not exactly what they work on already. I know I tried! Then gave up.

16. pcrh ◴[21 Jul 25 10:30 UTC] No.44633598[source]▶

>>44625120 (TP) #

Quite. "PhD-level knowledge" is the introduction to one's PhD thesis. The point of doing a PhD is to extend knowledge beyond what is already known, i.e that which cannot be known by an LLM.

17. dearilos ◴[22 Jul 25 23:27 UTC] No.44654172{4}[source]▶

>>44627752 #

how are you dealing with the accuracy of the review comments?

in my experience "review this PR" is very generic and ends up giving slop.

replies(1): >>44660673 #

18. dearilos ◴[22 Jul 25 23:30 UTC] No.44654193{6}[source]▶

>>44632454 #

if you do specific prompts based on your team's tribal knowledge and standards it works really well

"look at this code for bugs" doesn't end up working well, which is what most code reviewers do.

replies(1): >>44660655 #

19. jacobr1 ◴[23 Jul 25 15:57 UTC] No.44660655{7}[source]▶

>>44654193 #

Yep - this is the key to making it work

20. jacobr1 ◴[23 Jul 25 15:59 UTC] No.44660673{5}[source]▶

>>44654172 #

Yeah - that doesn't work. We include our own guidelines and practices in the context. Basically an ever evolving wiki page of PR best practices. We had that before trying LLMs - so it was easier for us to start. Also we found doing an LLM reformat of that data to might a much tighter set of "rules" helped as well.

replies(1): >>44665530 #

21. dearilos ◴[24 Jul 25 00:22 UTC] No.44665530{6}[source]▶

>>44660673 #

ive been building out a directory of code review rules for the last couple of months!

are you open to chatting and sharing notes on what works/doesn't work?

my email is ilya (at) wispbit.com