Test-driven development with an LLM for fun and profit

(blog.yfzhou.fyi)

219 points crazylogger | 1 comments | 16 Jan 25 15:30 UTC | HN request time: 0.274s | source

Show context

xianshou ◴[16 Jan 25 17:59 UTC] No.42728570[source]▶

One trend I've noticed, framed as a logical deduction:

1. Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.

2. Coding agents do massively better when they have a test-driven reward signal.

3. If a problem can be framed in a way that a coding agent can solve, that speeds up development at least 10x from the base case of human + assistant.

4. From (1)-(3), if you can get all the necessary context into 50k tokens and measure progress via tests, you can speed up development by 10x.

5. Therefore all new development should be microservices written from scratch and interacting via cleanly defined APIs.

Sure enough, I see HN projects evolving in that direction.

replies(12): >>42729039 #>>42729413 #>>42729713 #>>42729788 #>>42730016 #>>42730842 #>>42731468 #>>42733881 #>>42735489 #>>42736464 #>>42740025 #>>42747244 #

swatcoder ◴[16 Jan 25 19:51 UTC] No.42730016[source]▶

>>42728570 #

> 3. If a problem can be framed in a way that a coding agent can solve...

This reminds me of the South Park underwear gnomes. You picked a tool and set an expectation, then just kind of hand wave over the hard part in the middle, as though framing problems "in a way coding agents can solve" is itself a well-understood or bounded problem.

Does it sometimes take 50x effort to understand a problem and the agent well enough to get that done? Are there classes of problems where it can't be done? Are either of those concerns something you can recognize before they impact you? At commercial quality, is it an accessible skill for inexperienced people or do you need a mastery of coding, the problem domain, or the coding agent to be able to rely on it? Can teams recruit people who can reliable achieve any of this? How expensive is that talent? etc

replies(3): >>42731292 #>>42731937 #>>42745422 #

emptiestplace ◴[16 Jan 25 21:42 UTC] No.42731292[source]▶

>>42730016 #

We've had failed projects since long before LLMs. I think there is a tendency for people to gloss over this (3.) regardless, but working with an LLM it tends to become obvious much more quickly, without investing tens/hundreds of person-hours. I know it's not perfect, but I find a lot of the things people complain about would've been a problem either way - especially when people think they are going to go from 'hello world' to SaaS-billionaire in an hour.

I think mastery of the problem domain is still important, and until we have effectively infinite context windows (that work perfectly), you will need to understand how and when to refactor to maximize quality and relevance of data in context.

replies(1): >>42731453 #

dingnuts ◴[16 Jan 25 21:59 UTC] No.42731453[source]▶

>>42731292 #

well according to xianshou's profile they work in finance so it makes sense to me that they would gloss over the hard part of programming when describing how AI is going to improve it

replies(1): >>42731836 #

1. ziddoap ◴[16 Jan 25 22:43 UTC] No.42731836[source]▶

>>42731453 #

Working in one domain does not preclude knowledge of others. I work in cybersec but spent my first working decade in construction estimation for institutional builds. I can talk confidently about firewalls or the hospital you want to build.

No need to make assumptions based on a one-line hacker news profile.

↑