But I also fail catastrophically once a reasoning problem exceeds modest complexity.
replies(4):
I don’t find all these claims that models are somehow worse than humans in such areas convincing. Yes, they’re worse in some respects. But when you’re talking about things related to failures and accuracy, they’re mostly superhuman.
For example, how many humans can write hundred of lines of code (in seconds mind you) and regularly not have any syntax errors or bugs?
Ez, just use codegen.
Also the second part (not having bugs) is unlikely to be true for the LLM generated code, whereas traditional codegen will actually generate code with pretty much no bugs.