←back to thread

579 points paulpauper | 1 comments | | HN request time: 0s | source
Show context
InkCanon ◴[] No.43604503[source]
The biggest story in AI was released a few weeks ago but was given little attention: on the recent USAMO, SOTA models scored on average 5% (IIRC, it was some abysmal number). This is despite them supposedly having gotten 50%, 60% etc performance on IMO questions. This massively suggests AI models simply remember the past results, instead of actually solving these questions. I'm incredibly surprised no one mentions this, but it's ridiculous that these companies never tell us what (if any) efforts have been made to remove test data (IMO, ICPC, etc) from train data.
replies(18): >>43604865 #>>43604962 #>>43605147 #>>43605224 #>>43605451 #>>43606419 #>>43607255 #>>43607532 #>>43607825 #>>43608628 #>>43609068 #>>43609232 #>>43610244 #>>43610557 #>>43610890 #>>43612243 #>>43646840 #>>43658014 #
billforsternz ◴[] No.43607255[source]
I asked Google "how many golf balls can fit in a Boeing 737 cabin" last week. The "AI" answer helpfully broke the solution into 4 stages; 1) A Boeing 737 cabin is about 3000 cubic metres [wrong, about 4x2x40 ~ 300 cubic metres] 2) A golf ball is about 0.000004 cubic metres [wrong, it's about 40cc = 0.00004 cubic metres] 3) 3000 / 0.000004 = 750,000 [wrong, it's 750,000,000] 4) We have to make an adjustment because seats etc. take up room, and we can't pack perfectly. So perhaps 1,500,000 to 2,000,000 golf balls final answer [wrong, you should have been reducing the number!]

So 1) 2) and 3) were out by 1,1 and 3 orders of magnitude respectively (the errors partially cancelled out) and 4) was nonsensical.

This little experiment made my skeptical about the state of the art of AI. I have seen much AI output which is extraordinary it's funny how one serious fail can impact my point of view so dramatically.

replies(10): >>43607836 #>>43607857 #>>43607910 #>>43608930 #>>43610117 #>>43610390 #>>43611692 #>>43612201 #>>43612324 #>>43612398 #
aezart ◴[] No.43608930[source]
> I have seen much AI output which is extraordinary it's funny how one serious fail can impact my point of view so dramatically.

I feel the same way. It's like discovering for the first time that magicians aren't doing "real" magic, just sleight of hand and psychological tricks. From that point on, it's impossible to be convinced that a future trick is real magic, no matter how impressive it seems. You know it's fake even if you don't know how it works.

replies(2): >>43609752 #>>43609890 #
bambax ◴[] No.43609890[source]
I think there is a big divide here. Every adult on earth knows magic is "fake", but some can still be amazed and entertained by it, while others find it utterly boring because it's fake, and the only possible (mildly) interesting thing about it is to try to figure out what the trick is.

I'm in the second camp but find it kind of sad and often envy the people who can stay entertained even though they know better.

replies(5): >>43611595 #>>43611757 #>>43612440 #>>43613188 #>>43614673 #
1. nucleogenesis ◴[] No.43611595[source]
Idk I don’t think of it as fake - it’s creative fiction paired with sometimes highly skilled performance. I’ve learned a lot about how magic tricks work and I still love seeing performers do effects because it takes so much talent to, say, hold and hide 10 coins in your hands while showing them as empty or to shuffle a deck of cards 5x and have the audience cut it only to pull 4 aces off the top.