Test-driven development with an LLM for fun and profit

(blog.yfzhou.fyi)

219 points crazylogger | 1 comments | 16 Jan 25 15:30 UTC | HN request time: 0.213s | source

Show context

agentultra ◴[16 Jan 25 19:16 UTC] No.42729609[source]▶

This is not a good idea.

If you want better tests with more cases exercising your code: write property based tests.

Tests form an executable, informal specification of what your software is supposed to do. It should absolutely be written by hand, by a human, for other humans to use and understand. Natural language is not precise enough for even informal specifications of software modules, let alone software systems.

If using LLM's to help you write the code is your jam, I can't stop you, but at least write the tests. They're more important.

As an aside, I understand how this antipathy towards TDD develops. People write unit tests, after writing the implementation, because they see it as boilerplate code that mirrors what the code they're testing already does. They're missing the point of what makes a good test useful and sufficient. I would not expect generating more tests of this nature is going to improve software much.

Edit added some wording for clarity

replies(2): >>42730693 #>>42736189 #

ozten ◴[16 Jan 25 20:47 UTC] No.42730693[source]▶

>>42729609 #

I got massive productivity gains from having an LLM fill out my test suite.

It is like autocomplete and macros... "Based on these two unit tests, fill out the suite considering b, c, and d. Add any critical corner case tests I have missed or suggest them if they don't fit well."

It is on the human to look at the generated test to ensure a) they are comprehensive and b) useful and c) communicate clearly

replies(2): >>42731269 #>>42754141 #

1. agentultra ◴[19 Jan 25 06:00 UTC] No.42754141[source]▶

>>42730693 #

See, I’m arguing for writing fewer, better tests.

I realize that it’s the norm to rely heavily on unit tests. Hundreds or thousands of examples of inputs and outputs. We still find errors in programs. “Examples prove the presence of an error, not the absence of errors,” as Djikstra (or was it Hoare? I can’t remember) would say. So I understand how one could view having an LLM generate tests being a win for productivity in that case.

But such test suites don’t add much. And generating 20 more tests won’t tell me much more about the code. It will actually make the test suite harder to read and understand.

↑