Most active commenters

manmal(6)
theshrike79(3)

Popular/hot comments

>>46195979 #
>>46197575 #

←back to thread

AI should only run as fast as we can catch up

(higashi.blog)

196 points yuedongze | 44 comments | 08 Dec 25 17:38 UTC | HN request time: 0.584s | source | bottom

1. blauditore ◴[08 Dec 25 18:26 UTC] No.46195811[source]▶

>>46195198 (OP) #

All these engineers who claim to write most code through AI - I wonder what kind of codebase that is. I keep on trying, but it always ends up producing superficially okay-looking code, but getting nuances wrong. Also fails to fix them (just changes random stuff) if pointed to said nuances.

I work on a large product with two decades of accumulated legacy, maybe that's the problem. I can see though how generating and editing a simple greenfield web frontend project could work much better, as long as actual complexity is low.

replies(16): >>46195970 #>>46195979 #>>46196044 #>>46196111 #>>46196149 #>>46196181 #>>46196747 #>>46197925 #>>46198024 #>>46198073 #>>46198272 #>>46198478 #>>46199426 #>>46200435 #>>46202288 #>>46207763 #

2. cogman10 ◴[08 Dec 25 18:39 UTC] No.46195970[source]▶

>>46195811 (TP) #

Honestly, if you've ever looked at a claude.md file, it seems like absolute madness. I feel like I'm reading affirmations from AA.

replies(2): >>46197605 #>>46203911 #

3. hathawsh ◴[08 Dec 25 18:39 UTC] No.46195979[source]▶

>>46195811 (TP) #

I think your intuition matches mine. When I try to apply Claude Code to a large code base, it spends a long time looking through the code and then it suggests something incorrect or unhelpful. It's rarely worth the trouble.

When I give AI a smaller or more focused project, it's magical. I've been using Claude Code to write code for ESP32 projects and it's really impressive. OTOH, it failed to tell me about a standard device driver I could be using instead of a community device driver I found. I think any human who works on ESP-IDF projects would have pointed that out.

AI's failings are always a little weird.

replies(3): >>46197529 #>>46201280 #>>46203506 #

4. CuriouslyC ◴[08 Dec 25 18:45 UTC] No.46196044[source]▶

>>46195811 (TP) #

It's architecture dependent. A fairly functional modular monolith with good documentation can be accessible to LLMs at the million line scale, but a coupled monolith or poorly instrumented microservices can drive agents into the ground at 100k.

replies(1): >>46196223 #

5. tuhgdetzhh ◴[08 Dec 25 18:51 UTC] No.46196111[source]▶

>>46195811 (TP) #

Yes, unfortunately those who jumped on the microservices hype train over the past 15 years or so are now getting the benefits of Claude Code, since their entire codebases fits into the context window of Sonnet/Opus and can be "understood" by the LLM to generate useful code.

This is not the case for most monoliths, unless they are structured into LLM-friendly components that resemble patterns the models have seen millions of times in their training data, such as React components.

replies(1): >>46197575 #

6. silisili ◴[08 Dec 25 18:53 UTC] No.46196149[source]▶

>>46195811 (TP) #

> I work on a large product with two decades of accumulated legacy, maybe that's the problem

Definitely. I've found Claude at least isn't so good at working in large existing projects, but great at greenfielding.

Most of my use these days is having it write specific functions and tests for them, which in fairness, saves me a ton of time.

7. qudat ◴[08 Dec 25 18:56 UTC] No.46196181[source]▶

>>46195811 (TP) #

Are you using it only on massive codebases? It's much better with smaller codebases where it can put most of the code in context.

Another good use case is to use it for knowledge searching within a codebase. I find that to be incredibly useful without much context "engineering"

replies(1): >>46203469 #

8. yuedongze ◴[08 Dec 25 19:00 UTC] No.46196223[source]▶

I think it's definitely an interesting subject for Verification Engineering. the easier to task AI to do work more precisely, the easier we can check their work.

replies(1): >>46196465 #

9. CuriouslyC ◴[08 Dec 25 19:22 UTC] No.46196465{3}[source]▶

Yup. Codebase structure for agents is a rabbit hole I've spent a lot of time going down. The interesting thing is that it's mostly the same structure that humans tend to prefer, with a few tweaks: agents like smaller files/functions (more precise reads/edits), strongly typed functional programming, doc-comments with examples and hyperlinks to additional context, smaller directories with semantic subgroups, long/distinct variable names, etc.

replies(1): >>46197696 #

10. bob1029 ◴[08 Dec 25 19:50 UTC] No.46196747[source]▶

>>46195811 (TP) #

I have my best successes by keeping things constrained to method-level generation. Most of the things I dump into ChatGPT look like this:

  public static double ScoreItem(Span<byte> candidate, Span<byte> target)
  {
     //TODO: Return the normalized Levenshtein distance between the 2 byte sequences.
     //... any additional edge cases here ...
  }

I think generating more than one method at a time is playing with fire. Individual methods can be generated by the LLM and tested in isolation. You can incrementally build up and trust your understanding of the problem space by going a little bit slower. If the LLM is operating over a whole set of methods at once, it is like starting over each time you have to iterate.

replies(2): >>46197658 #>>46203878 #

11. manmal ◴[08 Dec 25 20:53 UTC] No.46197529[source]▶

In large projects you need to actually point it to the interesting files, because it has no way of knowing what it doesn’t know. Tell it to read this and that, creating summary documents, then clear the context and point it at those summaries. A few of those passes and you‘ll get useful results. A gap in its knowledge of relevant code will lead to broken functionality. Cursor and others have been trying to solve this with semantic search (embeddings) but IMO this just can’t work because relevance of a code piece for a task is not determinable by any of its traits.

replies(1): >>46199448 #

12. manmal ◴[08 Dec 25 20:57 UTC] No.46197575[source]▶

Well structured monoliths are modularized just like microservices. No need to give each module its own REST API in order to keep it clean.

replies(3): >>46199455 #>>46202013 #>>46203004 #

13. manmal ◴[08 Dec 25 21:01 UTC] No.46197605[source]▶

It’s magical incantations that might or might not protect you from bad behavior Claude learned from underqualified RL instructors. A classic instruction I have in CLAUDE.md is „Never delete a test. You are only allowed to replace with a test that covers the same branches.“ and another one „Never mention Claude in a commit message“. Of course those sometimes fail, so I do have a message hook that enforces a certain style of git messages.

replies(2): >>46200439 #>>46204514 #

14. samdoesnothing ◴[08 Dec 25 21:07 UTC] No.46197658[source]▶

I do this but with copilot. Write a comment and then spam opt-tab and 50% of the time it ends up doing what I want and I can read it line-by-line before tabbing the next one.

Genuine productivity boost but I don't feel like it's AI slop, sometimes it feels like its actually reading my mind and just preventing me from having to type...

replies(1): >>46197999 #

15. lukan ◴[08 Dec 25 21:11 UTC] No.46197696{4}[source]▶

Aren't those all things, humans also tend to prefer to read?

I like to read descriptive variable names, I just don't like to write them all the time.

16. freedomben ◴[08 Dec 25 21:32 UTC] No.46197925[source]▶

>>46195811 (TP) #

I've tried it extensively, and have the same experience as you. AI is also incredibly stubborn when it wants to go down a path I reject. It constantly tries to do it anyway and will slip things in.

I've tried vibe coding and usually end up with something subtly or horribly broken, with excessive levels of complexity. Once it digs itself a hole, it's very difficult to extricate it even with explicit instruction.

17. jerf ◴[08 Dec 25 21:39 UTC] No.46197999{3}[source]▶

I've settled in on this as well for most of my day-to-day coding. A lot of extremely fancy tab completion, using the agent only for manipulation tasks I can carefully define. I'm currently in a "write lots of code" mode which affects that, I think. In a maintenance mode I could see doing more agent prompting. It gives me a chance to catch things early and then put in a correct pattern for it to continue forward with. And honestly for a lot of tasks it's not particularly slower than "ask it to do something, correct its five errors, tweak the prompt" work flow.

I've had net-time-savings with bigger agentic tasks, but I still have to check it line-by-line when it is done, because it takes lazy shortcuts and sometimes just outright gets things wrong.

Big productivity boost, it takes out the worst of my job, but I still can't trust it at much above the micro scale.

I wish I could give a system prompt for the tab complete; there's a couple of things it does over and over that I'm sure I could prompt away but there's no way to feed that in that I know of.

18. junkaccount ◴[08 Dec 25 21:41 UTC] No.46198024[source]▶

>>46195811 (TP) #

Can you prove it in a blog and post it here that you do better code snippets than AI. If you claim "what kind of codebase", you should be able to use some codebase from github to prove it?

19. moomoo11 ◴[08 Dec 25 21:45 UTC] No.46198073[source]▶

>>46195811 (TP) #

You need to realize when you’re being marketed to and filter out the nonsense.

Now I use agentic coding a lot with maybe 80-90% success rate.

I’m on greenfield projects (my startup) and maintaining strict Md files with architecture decisions and examples helps a lot.

I barely write code anymore, and mostly code review and maintain the documentation.

In existing codebases pre-ai I think it’s near impossible because I’ve never worked anywhere that maintained documentation. It was always a chore.

20. themafia ◴[08 Dec 25 22:04 UTC] No.46198272[source]▶

>>46195811 (TP) #

> as long as actual complexity is low.

You can start there. Does it ever stay that way?

> I work on a large product with two decades of accumulated legacy

Survey says: No.

21. bojan ◴[08 Dec 25 22:25 UTC] No.46198478[source]▶

>>46195811 (TP) #

> I work on a large product with two decades of accumulated legacy, maybe that's the problem.

I'm in a similar situation, and for the first time ever I'm actually considering if a rewrite to microservices would make sense, with a microservice being something small enough an AI could actually deal with - and maybe even build largely on its own.

replies(1): >>46198669 #

22. vanviegen ◴[08 Dec 25 22:43 UTC] No.46198669[source]▶

If you're creating microservices that are small enough for a current-gen LLM to deal with well, that means you're creating way too many microservices. You'll be reminiscing your two decades of accumulated legacy monolith with fondness.

23. wubrr ◴[08 Dec 25 23:55 UTC] No.46199426[source]▶

>>46195811 (TP) #

I've generally had better luck when using it on new projects/repos. When working on a large existing repo it's very important to give it good context/links/pointers to how things currently work/how they should work in that repo.

Also - claude (~the best coding agent currently imo) will make mistakes, sometimes many of them - tell it to test the code it writes and make sure it's working - I've generally found its pretty good at debugging/testing and fixing it's own mistakes.

24. Yoric ◴[08 Dec 25 23:58 UTC] No.46199448{3}[source]▶

But in the end, do you feel that it has saved you time?

I find hand-holding Claude a permanent source of frustration, except in the rare case that it helps me discover an error in the code.

replies(1): >>46201732 #

25. Yoric ◴[08 Dec 25 23:59 UTC] No.46199455{3}[source]▶

I guess that the benefit of monoliths in the context is that they (often) live in distinct repositories, which makes it easier for Claude to ingest them entirely, or at least not get lost into looking at the wrong directory.

26. rprend ◴[09 Dec 25 02:02 UTC] No.46200435[source]▶

>>46195811 (TP) #

<1 year old startup with fullstack javascript monorepo. Hosted with a serverless platform with good devex, like cloudflare workers.

That’s the typical “claude code writes all my code” setup. That’s my setup.

This does require you to fit your problem to the solution. But when you do, the results are tremendous.

27. Havoc ◴[09 Dec 25 02:03 UTC] No.46200439{3}[source]▶

> Never mention Claude in a commit message“. Of course those sometimes fail,

It’s hardcoded into the system prompt which is why your CLAUDE.md approach fails. Ended up intercepting it out via proxy

replies(1): >>46201875 #

28. seanmcdirmid ◴[09 Dec 25 04:38 UTC] No.46201280[source]▶

Have you tried having AI build up documentation on the code first and then correct it where it’s understanding is wrong, then running code changes with the docs in the context, you can even separate it out for each module if you are daring. Ai still takes alot of hand holding to be productive with, which means our jobs are safe for now until they start learning about SWe principles somehow.

29. manmal ◴[09 Dec 25 05:58 UTC] No.46201732{4}[source]▶

I‘ve had a similar feeling before Opus 4.5. Now it suddenly clicks with me, and it has passed the shittiness threshold, into the „often useful“ area. I suspect that’s because Apple is partnering with Anthropic and they will have improved Swift support.

Eg it‘s great for refactoring now, it’s often updating the README along with renames without me asking. It’s also really good at rebasing quickly, but only by cherry-picking inside a worktree. Churning out small components I don’t want to add a new dependency for, those are usually good on first try.

For implementing whole features, the space of possible solutions is way too big to always hit something that I‘ll be satisfied with. Once I have an idea on how to implement something in broad strokes, I can give a very error ridden first draft to it as a stream of thoughts, let it read all required files, and make an implementation plan. Usually that’s not too far off, and doesn’t take that long. Once that’s done, Opus 4.5 is pretty good at implementing that plan. Still I read every line, if this will go to production.

30. manmal ◴[09 Dec 25 06:28 UTC] No.46201875{4}[source]▶

Thanks for this idea!

31. bccdee ◴[09 Dec 25 06:53 UTC] No.46202013{3}[source]▶

Conversely, poorly-structured microservices are just monoliths where most of the code is in other repositories.

32. mrtksn ◴[09 Dec 25 07:39 UTC] No.46202288[source]▶

>>46195811 (TP) #

So far I found that AI is very good at writing the code as in translating english to computer code.

Instead of dealing with intricacies of directly writing the code, I explain the AI what are we trying to achieve next and what approach I prefer. This way I am still on top of it, I am able to understand the quality of the code it generated and I’m the one who integrates everything.

So far I found the tools that are supposed to be able to edit the whole codebase at once be useless. I instantly loose perspective when the AI IDE fiddles with multiple code blocks and does some magic. The chatbot interface is superior for me as the control stays with me and I still follow the code writing step by step.

33. randomtoast ◴[09 Dec 25 09:27 UTC] No.46203004{3}[source]▶

One problem is that the idea of being "well-structured" has gone overboard at some point over the past 20 years in many companies. As a result, many companies now operate highly convoluted monolithic systems that are extremely difficult to replace.

In contrast, a poorly designed microservice can be replaced much more easily. You can identify the worst-performing and most problematic microservices and replace them selectively.

replies(1): >>46203600 #

34. eloisant ◴[09 Dec 25 10:46 UTC] No.46203469[source]▶

It's also good on massive codebases that include a lot of "good practices" examples.

Let's say you want to add a new functionality, for example plug to the shared user service, that already exist in another service in the same monorepo, the AI will be really good at identifying an example and applying it to your service.

35. divan ◴[09 Dec 25 10:55 UTC] No.46203506[source]▶

I start new projects "AI-first" – start with docs, and refining them on the go, with multiple CLAUDE.md in different folders (to give a right context where it's needed). This alone increases the chances of it getting tasks right tenfold. Plus I almost always verify myself all the code produced.

Ironically, this would be the best workflow with humans too.

36. tuhgdetzhh ◴[09 Dec 25 11:14 UTC] No.46203600{4}[source]▶

> One problem is that the idea of being "well-structured" has gone overboard at some point over the past 20 years

That's exactly my experience. While a well-structured monolith is a good idea in theory, and I'm sure such examples exist in practice, that has never been the case in any of my jobs. Friends working at other companies report similar experiences.

37. theshrike79 ◴[09 Dec 25 11:50 UTC] No.46203878[source]▶

"Dumping into ChatGPT" is by far the worst way to work with LLMs, then it lacks the greater context of the project and will just give you the statistical average output.

Using an agentic system that can at least read the other bits of code is more efficient than copypasting snippets to a web page.

replies(2): >>46204568 #>>46205350 #

38. theshrike79 ◴[09 Dec 25 11:54 UTC] No.46203911[source]▶

Way too many agent prompt files are just fan fiction or D&D character background documents that have no actual effect on what the agent does =)

39. HWR_14 ◴[09 Dec 25 13:05 UTC] No.46204514{3}[source]▶

Why would it be bad to mention Claude in a commit message?

replies(1): >>46208321 #

40. ◴[09 Dec 25 13:10 UTC] No.46204568{3}[source]▶

41. bob1029 ◴[09 Dec 25 14:38 UTC] No.46205350{3}[source]▶

> then it lacks the greater context of the project

This is the point. I don't want it thinking about my entire project. I want it looking at a very specific problem each time.

replies(1): >>46210382 #

42. daliusd ◴[09 Dec 25 17:29 UTC] No.46207763[source]▶

>>46195811 (TP) #

I use AI successfully in two projects:

* My 5 years old project: monorepo with backend, 2 front-ends and 2 libraries

* 10+ years old company project: about 20 various packages in monorepo

In both cases I successfully give Claude Code or OpenCode instructions either at package level or monorepo level. Usually I prefer package level.

E.g. just now I gave instructions in my personal project: "Invoice styles in /app/settings/invoice should be localized". It figured out that unlocalized strings comes from library package, added strings to the code and messages files (added missing translations), however has not cleaned up hardcoded strings from library. As I know code I have written extra prompt "Maybe INVOICE_STYLE_CONFIGS can be cleaned-up in such case" and it cleaned-up what I have expected, ran tests and linting.

43. manmal ◴[09 Dec 25 18:10 UTC] No.46208321{4}[source]▶

Just because Claude ran the commit command, doesn’t mean it wrote the code. That’s just a nasty marketing hack from Anthropic.

44. theshrike79 ◴[09 Dec 25 20:44 UTC] No.46210382{4}[source]▶

But why?

Most code is about patterns, specific code styles and reusing existing libraries. Without context none of that can be applied to the solution.

If you put a programmer in a room and give them a piece of paper with a function and say OPTIMISE THAT! - is it going to be their best work?