←back to thread

579 points paulpauper | 1 comments | | HN request time: 0.001s | source
Show context
InkCanon ◴[] No.43604503[source]
The biggest story in AI was released a few weeks ago but was given little attention: on the recent USAMO, SOTA models scored on average 5% (IIRC, it was some abysmal number). This is despite them supposedly having gotten 50%, 60% etc performance on IMO questions. This massively suggests AI models simply remember the past results, instead of actually solving these questions. I'm incredibly surprised no one mentions this, but it's ridiculous that these companies never tell us what (if any) efforts have been made to remove test data (IMO, ICPC, etc) from train data.
replies(18): >>43604865 #>>43604962 #>>43605147 #>>43605224 #>>43605451 #>>43606419 #>>43607255 #>>43607532 #>>43607825 #>>43608628 #>>43609068 #>>43609232 #>>43610244 #>>43610557 #>>43610890 #>>43612243 #>>43646840 #>>43658014 #
hyperbovine ◴[] No.43612243[source]
Is that really so surprising given what we know about how these models actually work? I feel vindicated on behalf of myself and all the other commenters who have been mercilessly downvoted over the past three years for pointing out the obvious fact that next token prediction != reasoning.
replies(1): >>43612270 #
aoeusnth1 ◴[] No.43612270[source]
2.5 pro scores 25%.

It’s just a much harder math benchmark which will fall by the end of next year just like all the others. You won’t be vindicated.

replies(1): >>43612302 #
hyperbovine ◴[] No.43612302[source]
Bold claim! Let's see what that 25% is. I guarantee it is the portion of the exam which is trivially answerable if you have a stored database of all previous math exams ever written to consult.
replies(1): >>43612821 #
aoeusnth1 ◴[] No.43612821[source]
There is 0% of the exam which is trivially answerable.

The entire point of USAMO problems is that they demand novel insight and rigorous, original proofs. They are intentionally designed not to be variations of things you can just look up. You have to reason your way through, step by logical step.

Getting 25% (~11 points) is exceptionally difficult. That often means fully solving one problem and maybe getting solid partial credit on another. The median score is often in the single digits.

replies(1): >>43614564 #
1. hyperbovine ◴[] No.43614564[source]
> There is 0% of the exam which is trivially answerable.

That's true, but of course, not what I claimed.

The claim is that, given the ability to memorize an every mathematical result that has ever been published (in print or online), it is not so difficult to get 25% correct on an exam by pattern matching.

Note that this is skill is, by definition, completely out of the reach of any human being, but that possessing it does not imply creativity or the ability to "think".