←back to thread

566 points PaulHoule | 1 comments | | HN request time: 0.309s | source
Show context
true_blue ◴[] No.44491118[source]
I tried the playground and got a strange response. I asked for a regex pattern, and the model gave itself a little game-plan, then it wrote the pattern and started to write tests for it. But it never stopped writing tests. It continued to write tests of increasing size until I guess it reached a context limit and the answer was canceled. Also, for each test it wrote, it added a comment about if the test should pass or fail, but after about the 30th test, it started giving the wrong answer for those too, saying that a test should fail when actually it should pass if the pattern is correct. And after about the 120th test, the tests started to not even make sense anymore. They were just nonsense characters until the answer got cut off.

The pattern it made was also wrong, but I think the first issue is more interesting.

replies(5): >>44491301 #>>44493417 #>>44493628 #>>44497569 #>>44503983 #
1. beders ◴[] No.44493417[source]
I think that's a prime example showing that token prediction simply isn't good enough for correctness. It never will be. LLMs are not designed to reason about code.