Microsoft and OpenAI's close partnership shows signs of fraying

(www.nytimes.com)

326 points jhunter1016 | 4 comments | 18 Oct 24 11:11 UTC | HN request time: 0.276s | source

Show context

mikeryan ◴[18 Oct 24 12:04 UTC] No.41878605[source]▶

While technical AI and LLMs are not something I’m well versed in. So as I sit on the sidelines and see the current proliferation of AI startups I’m starting to wonder where the moats are outside of access to raw computing power. Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.

replies(10): >>41878784 #>>41878809 #>>41878843 #>>41880703 #>>41881606 #>>41882000 #>>41885618 #>>41886010 #>>41886133 #>>41887349 #

bboygravity ◴[19 Oct 24 06:52 UTC] No.41886133[source]▶

>>41878605 #

The lead has been taken over by Xai already as far as I know. They seem to have 100k H100's up and running.

OpenAI is not really leading the LLM world anyway ever since Claude sonnet 3.5 came out.

replies(1): >>41887052 #

1. belter ◴[19 Oct 24 10:47 UTC] No.41887052[source]▶

>>41886133 #

> OpenAI is not really leading the LLM world anyway ever since Claude sonnet 3.5 came out.

Something many times repeated. It just takes a few minutes with the different models to find out it's not true.

replies(3): >>41887315 #>>41887320 #>>41888317 #

2. csomar ◴[19 Oct 24 11:55 UTC] No.41887315[source]▶

>>41887052 (TP) #

I'd say it's 50/50 at the moment. Half the times, sonnet gives a definitive and better answer.

3. daghamm ◴[19 Oct 24 11:57 UTC] No.41887320[source]▶

>>41887052 (TP) #

I usually use them side by side, and very often (not always) Claude is better.

4. mnky9800n ◴[19 Oct 24 15:20 UTC] No.41888317[source]▶

>>41887052 (TP) #

I think the main issue with these metrics, which you implicitly highlight, Is that they are not a one size fits all approach. In fact, they are often treated, at least casually, like they are some kind of model fit like an r squared value. Which is maybe a good description narrowly constrained to the task or set of tasks they are being evaluated on for the metric. But the complexity of the user experience combined with the poor sample rate that a person can individually experience leads to conclusions like these. And they are perfectly valid conclusions. If the model doesn’t work for you, why use it? But it also suggests that personal experience cannot be used to decide if the model performs in aggregate well or not. But this doesn’t matter to the individual user or problem space. Because they should of course use whatever works best for them.

↑