←back to thread

555 points maheshrijal | 1 comments | | HN request time: 0.244s | source
Show context
_fat_santa ◴[] No.43708027[source]
So at this point OpenAI has 6 reasoning models, 4 flagship chat models, and 7 cost optimized models. So that's 17 models in total and that's not even counting their older models and more specialized ones. Compare this with Anthropic that has 7 models in total and 2 main ones that they promote.

This is just getting to be a bit much, seems like they are trying to cover for the fact that they haven't actually done much. All these models feel like they took the exact same base model, tweaked a few things and released it as an entirely new model rather than updating the existing ones. In fact based on some of the other comments here it sounds like these are just updates to their existing model, but they release them as new models to create more media buzz.

replies(22): >>43708044 #>>43708100 #>>43708150 #>>43708219 #>>43708340 #>>43708462 #>>43708605 #>>43708626 #>>43708645 #>>43708647 #>>43708800 #>>43708970 #>>43709059 #>>43709249 #>>43709317 #>>43709652 #>>43709926 #>>43710038 #>>43710114 #>>43710609 #>>43710652 #>>43713438 #
shmatt ◴[] No.43708462[source]
Im old enough to remember the mystery and hype before o*/o1/strawberry that was supposed to be essentially AGI. We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet

Now we're up to o4, AGI is still not even in near site (depending on your definition, I know). And OpenAI is up to about 5000 employees. I'd think even before AGI a new model would be able to cover for at least 4500 of those employees being fired, is that not the case?

replies(8): >>43708694 #>>43708755 #>>43708824 #>>43709411 #>>43709774 #>>43710199 #>>43710213 #>>43710748 #
actsasbuffoon ◴[] No.43709411[source]
Meanwhile even the highest ranked models can’t do simple logic tasks. GothamChess on YouTube did some tests where he played against a bunch of the best models and every single one of them failed spectacularly.

They’d happily lose a queen to take a pawn. They failed to understand how pieces are even allowed to move, hallucinated the existence of new pieces, repeatedly declared checkmate when it wasn’t, etc.

I tried it last night with Gemini 2.5 Pro and it made it 6 turns before it started making illegal moves, and 8 turns before it got so confused about the state of the board before it refused to play with me any longer.

I was in the chess club in 3rd grade. One of the top ranked LLMs in the world is vastly dumber than I was in 3rd grade. But we’re going to pour hundreds of billions into this in the hope that it can end my career? Good luck with that, guys.

replies(4): >>43709556 #>>43710189 #>>43710252 #>>43716131 #
1. og_kalu ◴[] No.43710189[source]
LLMs can play chess fine.

The best model you can play with is decent for a human - https://github.com/adamkarvonen/chess_gpt_eval

SOTA models can't play it because these companies don't really care about it.