←back to thread

555 points maheshrijal | 2 comments | | HN request time: 0s | source
Show context
_fat_santa ◴[] No.43708027[source]
So at this point OpenAI has 6 reasoning models, 4 flagship chat models, and 7 cost optimized models. So that's 17 models in total and that's not even counting their older models and more specialized ones. Compare this with Anthropic that has 7 models in total and 2 main ones that they promote.

This is just getting to be a bit much, seems like they are trying to cover for the fact that they haven't actually done much. All these models feel like they took the exact same base model, tweaked a few things and released it as an entirely new model rather than updating the existing ones. In fact based on some of the other comments here it sounds like these are just updates to their existing model, but they release them as new models to create more media buzz.

replies(22): >>43708044 #>>43708100 #>>43708150 #>>43708219 #>>43708340 #>>43708462 #>>43708605 #>>43708626 #>>43708645 #>>43708647 #>>43708800 #>>43708970 #>>43709059 #>>43709249 #>>43709317 #>>43709652 #>>43709926 #>>43710038 #>>43710114 #>>43710609 #>>43710652 #>>43713438 #
1. nopinsight ◴[] No.43709926[source]
Research by METR suggests that frontier LLMs can perform software tasks over exponentially longer time horizon required for human engineers, with ~7-month for each doubling. o3 is above the trend line.

https://x.com/METR_Evals/status/1912594122176958939

—-

The AlexNet paper which kickstarted the deep learning era in 2012 was ahead of the 2nd-best entry by 11%. Many published AI papers then advanced SOTA by just a couple percentage points.

o3 high is about 9% ahead of o1 high on livebench.ai and there are also quite a few testimonials of their differences.

Yes, AlexNet made major strides in other aspects as well but it’s been just 7 months since o1-preview, the first publicly available reasoning model, which is a seminal advance beyond previous LLMs.

It seems some people have become desensitized to how rapidly things are moving in AI, despite its largely unprecedented pace of progress.

Ref:

- https://proceedings.neurips.cc/paper_files/paper/2012/file/c...

- https://livebench.ai/#/

replies(1): >>43710260 #
2. kadushka ◴[] No.43710260[source]
Imagenet had improved the error rate by 100*11/25=44%.

o1 to o3 error rate went from 28 to 19, so 100*9/28=32%.

But these are meaningless comparisons because it’s typically harder to improve already good results.