The force-feeding of AI features on an unwilling public

(www.honest-broker.com)

Show context

einrealist ◴[06 Jul 25 08:37 UTC] No.44478912[source]▶

The major AI gatekeepers, with their powerful models, are already experiencing capacity and scale issues. This won't change unless the underlying technology (LLMs) undergoes a fundamental shift. As more and more things become AI-enabled, how dependent will we be on these gatekeepers and their computing capacity? And how much will they charge us for prioritised access to these resources? And we haven't really gotten to the wearable devices stage yet.

Also, everyone who requires these sophisticated models now needs to send everything to the gatekeepers. You could argue that we already send a lot of data to public clouds. However, there was no economically viable way for cloud vendors to read, interpret, and reuse my data — my intellectual property and private information. With more and more companies forcing AI capabilities on us, it's often unclear who runs those models and who receives the data and what is really happening to the data.

This aggregation of power and centralisation of data worries me as much as the shortcomings of LLMs. The technology is still not accurate enough. But we want it to be accurate because we are lazy. So I fear that we will end up with many things of diminished quality in favour of cheaper operating costs — time will tell.

replies(3): >>44478949 #>>44479025 #>>44479921 #

kgeist ◴[06 Jul 25 09:03 UTC] No.44479025[source]▶

>>44478912 #

We run our own LLM server at the office for a month now, as an experiment (for privacy/infosec reasons), and a single RTX 5090 is enough to serve 50 people for occasional use. We run Qwen3 32b which in some benchmarks is equivalent to GPT 4.1-mini or Gemini 2.5 Flash. The GPU allows 2 concurrent requests at the same time with 32k context each and 60 tok/s. At first I was skeptical a single GPU would be enough, but it turns out, most people don't use LLMs 24/7.

replies(3): >>44479225 #>>44480111 #>>44480983 #

1. einrealist ◴[06 Jul 25 09:43 UTC] No.44479225[source]▶

>>44479025 #

If those smaller models are sufficient for your use cases, go for it. But for how much longer will companies release smaller models for free? They invested so much. They have to recoup that money. Much will depend on investor pressure and the financial environment (tax deductions etc).

Open Source endeavors will have a hard time to bear the resources to train models that are competitive. Maybe we will see larger cooperatives, like a Apache Software Foundation for ML?

replies(7): >>44479267 #>>44479356 #>>44479513 #>>44479541 #>>44479835 #>>44479940 #>>44480209 #

2. tankenmate ◴[06 Jul 25 09:51 UTC] No.44479267[source]▶

>>44479225 (TP) #

"Maybe we will see larger cooperatives, like a Apache Software Foundation for ML?"

I suspect the Linux Foundation might be a more likely source considering its backers and how much those backers have provided LF by way of resources. Whether that's aligned with LF's goals ...

3. msgodel ◴[06 Jul 25 10:08 UTC] No.44479356[source]▶

>>44479225 (TP) #

Even Google and Facebook are releasing distills of their models (Gemma3 is very good, competitive with qwen3 if not better sometimes.)

There are a number of reasons to do this: You want local inference, you want attention from devs and potential users etc.

Also the smaller self hostable models are where most of the improvement happens these days. Eventually they'll catch up with where the big ones are today. At this point I honestly wouldn't worry too much about "gatekeepers."

4. ◴[06 Jul 25 10:42 UTC] No.44479513[source]▶

>>44479225 (TP) #

5. Gigachad ◴[06 Jul 25 10:48 UTC] No.44479541[source]▶

>>44479225 (TP) #

Seems like you don’t have to train from scratch. You can just distil a new model off an existing one by just buying api credits to copy the model.

replies(2): >>44479942 #>>44480855 #

6. DebtDeflation ◴[06 Jul 25 11:40 UTC] No.44479835[source]▶

>>44479225 (TP) #

It's not just about smaller models. I recently bought a Macbook M4 Max with 128GB RAM. You can run surprisingly large models locally with unified memory (albeit somewhat slowly). And now AMD has brought that capability to the X86 world with Strix. But I agree that how long Google, Meta, Alibaba, etc. will continue to release open weight models is a big question. It's obviously just a catch-up strategy aimed at the moats of OpenAI and Anthropic, once they catch up the incentive disappears.

7. ben_w ◴[06 Jul 25 11:55 UTC] No.44479940[source]▶

>>44479225 (TP) #

> Open Source endeavors will have a hard time to bear the resources to train models that are competitive.

Perhaps, but see also SETI@home and similar @home/BOINC projects.

8. einrealist ◴[06 Jul 25 11:56 UTC] No.44479942[source]▶

>>44479541 #

Your "API credits" don't buy the model. You just buy some resource to use the model that is running somewhere else.

replies(2): >>44480744 #>>44480992 #

9. brookst ◴[06 Jul 25 12:29 UTC] No.44480209[source]▶

>>44479225 (TP) #

Pricing for commodities does not allow for “recouping costs”. All it takes is one company seeing models as a complementary good to their core product, worth losing money on, and nobody else can charge more.

I’d support an Apache for ML but I suspect it’s unnecessary. Look at all of the money companies spend developing Linux; it will likely be the same story.

10. Drakim ◴[06 Jul 25 13:41 UTC] No.44480744{3}[source]▶

>>44479942 #

You don't understand what Gigachad is talking about. You can buy API credits to gain access to a model in the cloud, and then use that to train your own local model though a process called distilling.

11. hatefulmoron ◴[06 Jul 25 13:56 UTC] No.44480855[source]▶

>>44479541 #

"Just" is doing a lot of heavy lifting there. It definitely helps with getting data but actually training your model would be very capital intensive, ignoring the cost of paying for those outputs you're training on.

12. threeducks ◴[06 Jul 25 14:14 UTC] No.44480992{3}[source]▶

>>44479942 #

What the parent poster means is that you can use the API to generate many question/answer pairs on which you then train your own model. For a more detailed explanation of this and other related methods, I can recommend this paper: https://arxiv.org/pdf/2402.13116

↑