(www.lesswrong.com)

579 points paulpauper | 2 comments | 06 Apr 25 18:01 UTC | HN request time: 0.609s | source

Show context

InkCanon ◴[06 Apr 25 20:03 UTC] No.43604503[source]▶

The biggest story in AI was released a few weeks ago but was given little attention: on the recent USAMO, SOTA models scored on average 5% (IIRC, it was some abysmal number). This is despite them supposedly having gotten 50%, 60% etc performance on IMO questions. This massively suggests AI models simply remember the past results, instead of actually solving these questions. I'm incredibly surprised no one mentions this, but it's ridiculous that these companies never tell us what (if any) efforts have been made to remove test data (IMO, ICPC, etc) from train data.

replies(18): >>43604865 #>>43604962 #>>43605147 #>>43605224 #>>43605451 #>>43606419 #>>43607255 #>>43607532 #>>43607825 #>>43608628 #>>43609068 #>>43609232 #>>43610244 #>>43610557 #>>43610890 #>>43612243 #>>43646840 #>>43658014 #

1. geuis ◴[07 Apr 25 04:51 UTC] No.43607825[source]▶

>>43604503 #

Query: Could you explain the terminology to people who don't follow this that closely?

replies(1): >>43607932 #

2. BlanketLogic ◴[07 Apr 25 05:08 UTC] No.43607932[source]▶

>>43607825 (TP) #

Not the OP but

USAMO : USA Math Olympiad. Referred here https://arxiv.org/pdf/2503.21934v1

IMO : International Math Olympiad

SOTA : State of the Art

OP is probably referring to this referred to this paper here https://arxiv.org/pdf/2503.21934v1. The paper explains out how a rigorous testing revealed abysmal performance of LLMs (results that are at odds with how they are hyped about).

↑

Recent AI model progress feels mostly like bullshit