←back to thread

Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 5 comments | 06 Apr 25 18:01 UTC | HN request time: 0.761s | source

Show context

InkCanon ◴[06 Apr 25 20:03 UTC] No.43604503[source]▶

>>43603453 (OP) #

The biggest story in AI was released a few weeks ago but was given little attention: on the recent USAMO, SOTA models scored on average 5% (IIRC, it was some abysmal number). This is despite them supposedly having gotten 50%, 60% etc performance on IMO questions. This massively suggests AI models simply remember the past results, instead of actually solving these questions. I'm incredibly surprised no one mentions this, but it's ridiculous that these companies never tell us what (if any) efforts have been made to remove test data (IMO, ICPC, etc) from train data.

replies(18): >>43604865 #>>43604962 #>>43605147 #>>43605224 #>>43605451 #>>43606419 #>>43607255 #>>43607532 #>>43607825 #>>43608628 #>>43609068 #>>43609232 #>>43610244 #>>43610557 #>>43610890 #>>43612243 #>>43646840 #>>43658014 #

TrackerFF ◴[07 Apr 25 08:11 UTC] No.43609068[source]▶

What would the average human score be?

I.e. if you randomly sampled N humans to take those tests.

replies(1): >>43609102 #

1. sanxiyn ◴[07 Apr 25 08:17 UTC] No.43609102[source]▶

The average human score on USAMO (let alone IMO) is zero, of course. Source: I won medals at Korean Mathematical Olympiad.

replies(3): >>43609193 #>>43610920 #>>43612359 #

2. vintermann ◴[07 Apr 25 08:33 UTC] No.43609193[source]▶

>>43609102 (TP) #

Average, hmmm?

3. lordgrenville ◴[07 Apr 25 13:10 UTC] No.43610920[source]▶

>>43609102 (TP) #

I am hesitant to correct a math Olympian, but don't you mean the median?

replies(1): >>43621718 #

4. hyperbovine ◴[07 Apr 25 15:04 UTC] No.43612359[source]▶

>>43609102 (TP) #

This is a disappointing answer from an MO alum. Pick a quantile, any quantile...

5. nhinck3 ◴[08 Apr 25 13:44 UTC] No.43621718[source]▶

Average is fine.