It's very useful when you get the answer in several minutes rather than half a hour.
It's very useful when you get the answer in several minutes rather than half a hour.
Maybe it is because they generate the code in one pass and cannot return back and fix the issues. LLM makers, you should allow LLMs to review and edit the generated code.
For example, imagine if you test a vector-like collection. In every test case dumb LLM creates vector manually and makes inserts/deletes. It could be replaced by adding a helper function that accepts a sequence of operations and returns the processed vector. Furthermore, once you have that function, you can merge multiple tests with parametrization, by having a test function accept a sequence of operation and expected result:
parametrize('actions, result', (
# Test that remove removes items from vector
([Ins(1, 2, 3, 4), Remove(4)], [1, 2, 3]),
...
)
But it takes time to write this explanation, and dumb LLM might not merge all tests from the first time."Don't create vector manually inline in every test case, make a helper function for that."
and see what agent does. It might do something smart. It might do something a bit dumb but by understanding why exactly it's dumb, you can communicate what correction is needed pretty smoothly.