Test-driven development with an LLM for fun and profit

(blog.yfzhou.fyi)

219 points crazylogger | 2 comments | 16 Jan 25 15:30 UTC | HN request time: 0s | source

Show context

xianshou ◴[16 Jan 25 17:59 UTC] No.42728570[source]▶

One trend I've noticed, framed as a logical deduction:

1. Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.

2. Coding agents do massively better when they have a test-driven reward signal.

3. If a problem can be framed in a way that a coding agent can solve, that speeds up development at least 10x from the base case of human + assistant.

4. From (1)-(3), if you can get all the necessary context into 50k tokens and measure progress via tests, you can speed up development by 10x.

5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.

Sure enough, I see HN projects evolving in that direction.

replies(12): >>42729039 #>>42729413 #>>42729713 #>>42729788 #>>42730016 #>>42730842 #>>42731468 #>>42733881 #>>42735489 #>>42736464 #>>42740025 #>>42747244 #

1. sdesol ◴[16 Jan 25 19:31 UTC] No.42729788[source]▶

>>42728570 #

> you can speed up development by 10x.

If you know what you are doing, then yes. If you are a domain expert and can articulate your thoughts clearly in a prompt, you will most likely see a boost—perhaps two to three times—but ten times is unlikely. And if you don't fully understand the problem, you may experience a negative effect.

replies(1): >>42730079 #

2. throwup238 ◴[16 Jan 25 19:56 UTC] No.42730079[source]▶

>>42729788 (TP) #

I think it also depends on how much yak-shaving is involved in the domain, regardless of expertise. Whether that’s something simple like remembering the right bash incantation or something more complex like learning enough Terraform and providers to be able to spin up cloud infrastructure.

Some projects just have a lot of stuff to do around the edges and LLMs excel at that.

↑