←back to thread

321 points laserduck | 7 comments | | HN request time: 0.829s | source | bottom
Show context
fsndz ◴[] No.42157451[source]
They want to throw LLMs at everything even if it does not make sense. Same is true for all the AI agent craze: https://medium.com/thoughts-on-machine-learning/langchains-s...
replies(10): >>42157567 #>>42157658 #>>42157733 #>>42157734 #>>42157763 #>>42157785 #>>42158142 #>>42158278 #>>42158342 #>>42158474 #
marcosdumay ◴[] No.42157567[source]
If feels like the entire world has gone crazy.

Even the serious idea that the article thinks could work is throwing the unreliable LLMs at verification! If there's any place you can use something that doesn't work most of the time, I guess it's there.

replies(5): >>42157699 #>>42157841 #>>42157907 #>>42158151 #>>42158574 #
1. ajuc ◴[] No.42157699[source]
It's similar in regular programming - LLMs are better at writing test code than actual code. Mostly because it's simpler (P vs NP etc), but I think also because it's less obvious when test code doesn't work.

Replace all asserts with expected ==expected and most people won't notice.

replies(4): >>42157802 #>>42157883 #>>42158103 #>>42158154 #
2. jeltz ◴[] No.42157802[source]
> Replace all asserts with expected ==expected and most people won't notice.

Those tests were very common back when I used to work in Ruby on Rails and automatically generating test stubs was a popular practice. These stubs were often just converted into expected == expected tests so that they passed and then left like that.

3. majormajor ◴[] No.42157883[source]
LLMs are pretty damn useful for generating tests, getting rid of a lot of tedium, but yeah, it's the same as human-written tests: if you don't check that your test doesn't work when it shouldn't (not the same thing as just writing a second test for that case - both those tests need to fail if you intentionally screw with their separate fixtures), then you shouldn't have too much confidence in your test.
replies(1): >>42158133 #
4. MichaelNolan ◴[] No.42158103[source]
> Replace all asserts with expected == expected and most people won't notice.

It’s too resource intensive for all code, but mutation testing is pretty good at finding these sorts of tests that never fail. https://pitest.org/

5. marcosdumay ◴[] No.42158133[source]
If LLMs can generate a test for you, it's because it's a test that you shouldn't need to write. They can't test what is really important, at all.

Some development stacks are extremely underpowered for code verification, so they do patch the design issue. Just like some stacks are underpowered for abstraction and need patching by code generation. Both of those solve an immediate problem, in a haphazard and error-prone way, by adding burden on maintenance and code evolution linearly to how much you use it.

And worse, if you rely too much on them they will lead your software architecture and make that burden superlinear.

replies(1): >>42158320 #
6. rsynnott ◴[] No.42158154[source]
I mean, define ‘better’. Even with actual human programmers, tests which do not in fact test the thing are already a bit of an epidemic. A test which doesn’t test is worse than useless.
7. williamcotton ◴[] No.42158320{3}[source]
Claude wrote the harness and pretty much all of these tests, eg:

https://github.com/williamcotton/search-input-query/blob/mai...

It is a good test suite and it saved me quite a bit of typing!

In fact, Claude did most of the typing for the entire project:

https://github.com/williamcotton/search-input-query

BTW, I obviously didn't just type "make a lexer and multi-pass parser that returns multiple errors and then make a single-line instance of a Monaco editor with error reporting, type checking, syntax highlighting and tab completion".

I put it together piece-by-piece and with detailed architectural guidance.