←back to thread

579 points paulpauper | 2 comments | | HN request time: 0.643s | source
Show context
InkCanon ◴[] No.43604503[source]
The biggest story in AI was released a few weeks ago but was given little attention: on the recent USAMO, SOTA models scored on average 5% (IIRC, it was some abysmal number). This is despite them supposedly having gotten 50%, 60% etc performance on IMO questions. This massively suggests AI models simply remember the past results, instead of actually solving these questions. I'm incredibly surprised no one mentions this, but it's ridiculous that these companies never tell us what (if any) efforts have been made to remove test data (IMO, ICPC, etc) from train data.
replies(18): >>43604865 #>>43604962 #>>43605147 #>>43605224 #>>43605451 #>>43606419 #>>43607255 #>>43607532 #>>43607825 #>>43608628 #>>43609068 #>>43609232 #>>43610244 #>>43610557 #>>43610890 #>>43612243 #>>43646840 #>>43658014 #
AstroBen ◴[] No.43605224[source]
This seems fairly obvious at this point. If they were actually reasoning at all they'd be capable (even if not good) of complex games like chess

Instead they're barely able to eek out wins against a bot that plays completely random moves: https://maxim-saplin.github.io/llm_chess/

replies(4): >>43605990 #>>43606017 #>>43606243 #>>43609237 #
1. kylebyte ◴[] No.43605990[source]
Every day I am more convinced that LLM hype is the equivalent of someone seeing a stage magician levitate a table across the stage and assuming this means hovercars must only be a few years away.
replies(1): >>43606479 #
2. Terr_ ◴[] No.43606479[source]
I believe there's a widespread confusion between a fictional character that is described as a AI assistant, versus the actual algorithm building the play-story which humans imagine the character from. An illusion actively promoted by companies seeking investment and hype.

AcmeAssistant is "helpful" and "clever" in the same way that Vampire Count Dracula is "brooding" and "immortal".