←back to thread

321 points jhunter1016 | 4 comments | | HN request time: 0.204s | source
Show context
mikeryan ◴[] No.41878605[source]
While technical AI and LLMs are not something I’m well versed in. So as I sit on the sidelines and see the current proliferation of AI startups I’m starting to wonder where the moats are outside of access to raw computing power. Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.
replies(10): >>41878784 #>>41878809 #>>41878843 #>>41880703 #>>41881606 #>>41882000 #>>41885618 #>>41886010 #>>41886133 #>>41887349 #
bboygravity ◴[] No.41886133[source]
The lead has been taken over by Xai already as far as I know. They seem to have 100k H100's up and running.

OpenAI is not really leading the LLM world anyway ever since Claude sonnet 3.5 came out.

replies(1): >>41887052 #
1. belter ◴[] No.41887052[source]
> OpenAI is not really leading the LLM world anyway ever since Claude sonnet 3.5 came out.

Something many times repeated. It just takes a few minutes with the different models to find out it's not true.

replies(3): >>41887315 #>>41887320 #>>41888317 #
2. csomar ◴[] No.41887315[source]
I'd say it's 50/50 at the moment. Half the times, sonnet gives a definitive and better answer.
3. daghamm ◴[] No.41887320[source]
I usually use them side by side, and very often (not always) Claude is better.
4. mnky9800n ◴[] No.41888317[source]
I think the main issue with these metrics, which you implicitly highlight, Is that they are not a one size fits all approach. In fact, they are often treated, at least casually, like they are some kind of model fit like an r squared value. Which is maybe a good description narrowly constrained to the task or set of tasks they are being evaluated on for the metric. But the complexity of the user experience combined with the poor sample rate that a person can individually experience leads to conclusions like these. And they are perfectly valid conclusions. If the model doesn’t work for you, why use it? But it also suggests that personal experience cannot be used to decide if the model performs in aggregate well or not. But this doesn’t matter to the individual user or problem space. Because they should of course use whatever works best for them.