Recent AI model progress feels mostly like bullshit

(www.lesswrong.com)

579 points paulpauper | 1 comments | 06 Apr 25 18:01 UTC | HN request time: 0.24s | source

Show context

InkCanon ◴[06 Apr 25 20:03 UTC] No.43604503[source]▶

The biggest story in AI was released a few weeks ago but was given little attention: on the recent USAMO, SOTA models scored on average 5% (IIRC, it was some abysmal number). This is despite them supposedly having gotten 50%, 60% etc performance on IMO questions. This massively suggests AI models simply remember the past results, instead of actually solving these questions. I'm incredibly surprised no one mentions this, but it's ridiculous that these companies never tell us what (if any) efforts have been made to remove test data (IMO, ICPC, etc) from train data.

replies(18): >>43604865 #>>43604962 #>>43605147 #>>43605224 #>>43605451 #>>43606419 #>>43607255 #>>43607532 #>>43607825 #>>43608628 #>>43609068 #>>43609232 #>>43610244 #>>43610557 #>>43610890 #>>43612243 #>>43646840 #>>43658014 #

AstroBen ◴[06 Apr 25 21:45 UTC] No.43605224[source]▶

>>43604503 #

This seems fairly obvious at this point. If they were actually reasoning at all they'd be capable (even if not good) of complex games like chess

Instead they're barely able to eek out wins against a bot that plays completely random moves: https://maxim-saplin.github.io/llm_chess/

replies(4): >>43605990 #>>43606017 #>>43606243 #>>43609237 #

gilleain ◴[07 Apr 25 00:26 UTC] No.43606243[source]▶

>>43605224 #

Just in case it wasn't a typo, and you happen not to know ... that word is probably "eke" - meaning gaining (increasing, enlarging from wiktionary) - rather than "eek" which is what mice do :)

replies(2): >>43606466 #>>43607167 #

1. AstroBen ◴[07 Apr 25 02:58 UTC] No.43607167[source]▶

>>43606243 #

hah you're right on the spelling but wrong on my meaning. That's probably the first time I've typed it. I don't think LLMs are quite at the level of mice reasoning yet!

https://dictionary.cambridge.org/us/dictionary/english/eke-o... to obtain or win something only with difficulty or great effort

↑