Coding with LLMs in the summer of 2025 – an update

1. theodorewiles ◴[20 Jul 25 13:32 UTC] No.44625055[source]▶

My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?

replies(5): >>44625070 #>>44625105 #>>44625128 #>>44625450 #>>44625922 #

2. antirez ◴[20 Jul 25 13:33 UTC] No.44625070[source]▶

>>44625055 (TP) #

I believe it’s the same as for humans: different files implementing different parts of the system with good interfaces and sensible boundaries.

replies(2): >>44625329 #>>44626082 #

3. Hasnep ◴[20 Jul 25 13:39 UTC] No.44625105[source]▶

>>44625055 (TP) #

And my question to that is how would that be different from a codebase designed for humans?

replies(1): >>44625856 #

4. Keyframe ◴[20 Jul 25 13:43 UTC] No.44625128[source]▶

>>44625055 (TP) #

like a microservice architecture? overall architecture to get the context and then dive into a micro one?

5. dkdcio ◴[20 Jul 25 14:07 UTC] No.44625329[source]▶

>>44625070 #

this is a common pattern I see -- if your codebase is confusing for LLMs, it's probably confusing for people too

replies(1): >>44626026 #

6. exitb ◴[20 Jul 25 14:19 UTC] No.44625450[source]▶

>>44625055 (TP) #

And on top of that - can you steer an LLM to create this kind of code? In my experience the models don’t really have a „taste” for detecting complexity creep and reengineering for simplicity, in the same way an experienced human does.

replies(1): >>44626436 #

7. __MatrixMan__ ◴[20 Jul 25 15:04 UTC] No.44625856[source]▶

>>44625105 #

I think it means finer toplevel granularity re: what's runnable/testable at a given moment. I've been exploring this for my own projects and although it's not a silver bullet, I think there's something to it.

----

Several codebases I've known have provided a three-stage pipeline: unit tests, integration tests, and e2e tests. Each of these batches of tests depend on the creation of one of three environments, and the code being tested is what ends up in those environments. If you're interested in a particular failing test, you can use the associated environment and just iterate on the failing test.

For humans with a bit of tribal knowledge about the project, humans who have already solved the get-my-dev-environment-set-up problem in more or less uniform way, this works ok. Humans are better at retaining context over weeks and months, whereas you have to spin up a new session with an LLM every few hours or so. So we've created environments for ourselves that we ignore most of the time, but that are too complex to be bite sized for an agent that comes on the scene as a blank slate every few hours. There are too few steps from blank-slate to production, and each of them is too large.

But if successively more complex environments can be built on each other in arbitrarily many steps, then we could achieve finer granularity. As a nix user, my mental model for this is function composition where the inputs and outputs are environments, but an analogous model would be layers in a docker files where you test each layer before building the one on top of it.

Instead of maybe three steps, there are eight or ten. The goal would be to have both whatever code builds the environment, and whatever code tests it, paired up into bite-sized chunks so that a failure in the pipeline points you a specific stage which is more specific that "the unit tests are failing". Ideally test coverage and implementation complexity get distributed uniformly across those stages.

Keeping the scope of the stages small maximizes the amount of your codebase that the LLM can ignore while it works. I have a flake output and nix devshell corresponding to each stage in the pipeline and I'm using pytest to mark tests based on which stage they should run in. So I run the agent from the devshell that corresponds with whichever stage is relevant at the moment, and I introduce it to onlythe tests and code that are relevant to that stage (the assumption being that all previous stages are known to be in good shape). Most of the time, it doesn't need to know that it's working stage 5 of 9, so it "feels" like a smaller codebase than it actually is.

If evidence emerges that I've engaged the LLM at the wrong stage, I abandon the session and start over at the right level (now 6 of 9 or somesuch).

8. victorbjorklund ◴[20 Jul 25 15:11 UTC] No.44625922[source]▶

>>44625055 (TP) #

I found that it is beneficial to create more libraries. If I for example build a large integration to an API (basically a whole api client) I would in the past have it in the same repo but now I make it a standalone library.

9. physicles ◴[20 Jul 25 15:22 UTC] No.44626026{3}[source]▶

>>44625329 #

This fact is one of the most pleasant surprises I’ve had during this AI wave. Finally, a concrete reason to care about your docs and your code quality.

replies(1): >>44634960 #

10. afro88 ◴[20 Jul 25 15:26 UTC] No.44626082[source]▶

>>44625070 #

Well documented helps a lot too.

You can use an LLM to help document a codebase, but it's still an arduous task because you do need to review and fix up the generated docs. It will make, sometimes glaring sometimes subtle, mistakes. And you want your documentation to provide accuracy rather than double down on or even introduce misunderstanding.

11. lubujackson ◴[20 Jul 25 15:58 UTC] No.44626436[source]▶

>>44625450 #

I am vibe coding a complex app. You can certainly keep things clean but the trick is to enforce a rigid structure. This does add a veneer of complexity but simplifies " implement this new module" or "add this feature across all relevant files".

12. aitchnyu ◴[21 Jul 25 13:41 UTC] No.44634960{4}[source]▶

>>44626026 #

"What helps the human helps the AI" in https://blog.nilenso.com/blog/2025/05/29/ai-assisted-coding/

In future I'll go "In the name of our new darling bot, let us unit test and refactor this complicated thing".