Though on the other hand figuring out which manipulations are effective does teach us something. And I think most problems boil down to pattern matching, creating a true, easily testable AGI test may be tough.
Though on the other hand figuring out which manipulations are effective does teach us something. And I think most problems boil down to pattern matching, creating a true, easily testable AGI test may be tough.
https://chatgpt.com/share/2fde1db5-00cf-404d-9ae5-192aa5ac90...
GPT-4 created a plan very similar to the article, i.e. it also suggested using Python to pre-process data. It also suggested using program synthesis. So I'd say it's already 90% there.
> "Execute the synthesized program on the test inputs."
> "Verify the outputs against the expected results. If the results are incorrect, iteratively refine the hypotheses and rules."
So people saying that it's ad-hoc are wrong. LLMs know how to solve these tasks, they are just not very good at coding, and iterative refinement tooling is in infancy.