Some thoughts on LLMs and software development

(martinfowler.com)

416 points floverfelt | 1 comments | 28 Aug 25 18:52 UTC | HN request time: 0.209s | source

Show context

daviding ◴[28 Aug 25 20:47 UTC] No.45056856[source]▶

I get a lot of productivity out of LLMs so far, which for me is a simple good sign. I can get a lot done in a shorter time and it's not just using them as autocomplete. There is this nagging doubt that there's some debt to pay one day when it has too loose a leash, but LLMs aren't alone in that problem.

One thing I've done with some success is use a Test Driven Development methodology with Claude Sonnet (or recently GPT-5). Moving forward the feature in discrete steps with initial tests and within the red/green loop. I don't see a lot written or discussed about that approach so far, but then reading Martin's article made me realize that the people most proficient with TDD are not really in the Venn Diagram intersection of those wanting to throw themselves wholeheartedly into using LLMs to agent code. The 'super clippy' autocomplete is not the interesting way to use them, it's with multiple agents and prompt techniques at different abstraction levels - that's where you can really cook with gas. Many TDD experts have great pride in the art of code, communicating like a human and holding the abstractions in their head, so we might not get good guidance from the same set of people who helped us before. I think there's a nice green field of 'how to write software' lessons with these tools coming up, with many caution stories and lessons being learnt right now.

edit: heh, just saw this now, there you go - https://news.ycombinator.com/item?id=45055439

replies(1): >>45056943 #

tra3 ◴[28 Aug 25 20:55 UTC] No.45056943[source]▶

>>45056856 #

It feels like Tdd/llm connection is implied — “and also generate tests”. Thought it’s not cannonical tdd of course. I wonder if it’ll turn the tide towards tech that’s easier to test automatically, like maybe ssr instead of react.

replies(2): >>45057027 #>>45057482 #

daviding ◴[28 Aug 25 21:05 UTC] No.45057027[source]▶

>>45056943 #

Yep, it's great for generating tests and so much of that is boilerplate that it feels great value. As a super lazy developer it's great as the burden of all that mechanical 'stuff' being spat out is nice. Test code being like baggage feels lighter when it's just churned out as part of the process, as in no guilt just to delete it all when what you want to do changes. That in itself is nice. Plus of course MCP things (Playwright etc) for integration things is great.

But like you said, it was meant more TDD as 'test first' - so a sort of 'prompt-as-spec' that then produces the test/spec code first, and then go iterate on that. The code design itself is different as influenced by how it is prompted to be testable. So rather than go 'prompt -> code' it's more an in-between stage of prompting the test initially and then evolve, making sure the agent is part of the game of only writing testable code and automating the 'gate' of passes before expanding something. 'prompt -> spec -> code' repeat loop until shipped.

replies(1): >>45057450 #

1. girvo ◴[28 Aug 25 21:50 UTC] No.45057450[source]▶

>>45057027 #

The only thing I dislike is what it chooses to test when asked to just "generate tests for X": it often chooses to build those "straitjacket for your code" style tests which aren't actually useful in terms of catching bugs, they just act as "any change now makes this red"

As a simple example, a "buildUrl" style function that put one particular host for prod and a different host for staging (for an "environment" argument) had that argument "tested" by exactly comparing the entire functions return string, encoding all the extra functionality into it (that was tested earlier anyway).

A better output would be to check startsWith(prodHost) or similar, which is what I changed it into, but I'm still trying to work out how to get coding agents to do that in the first or second attempt.

But that's also not surprising: people write those kinds of too-narrow not-useful tests all the time, the codebase I work on is littered with them!

↑