(redwoodresearch.substack.com)

394 points tomduncalf | 1 comments | 17 Jun 24 21:51 UTC | HN request time: 0.199s | source

Show context

asperous ◴[17 Jun 24 23:19 UTC] No.40712326[source]▶

Having tons of people employ human ingenuity to manipulate existing LLMs into passing this one benchmark kind of defeats the purpose of testing for "AGI". The author points this out as it's more of a pattern matching test.

Though on the other hand figuring out which manipulations are effective does teach us something. And I think most problems boil down to pattern matching, creating a true, easily testable AGI test may be tough.

replies(5): >>40712503 #>>40712555 #>>40712632 #>>40713120 #>>40713156 #

1. sheeshkebab ◴[17 Jun 24 23:59 UTC] No.40712632[source]▶

>>40712326 #

Show me a test and I’ll show you a neural network that passes it… used to be an saying.

↑

Getting 50% (SoTA) on Arc-AGI with GPT-4o