←back to thread

451 points imartin2k | 1 comments | | HN request time: 0.212s | source
Show context
einrealist ◴[] No.44478912[source]
The major AI gatekeepers, with their powerful models, are already experiencing capacity and scale issues. This won't change unless the underlying technology (LLMs) undergoes a fundamental shift. As more and more things become AI-enabled, how dependent will we be on these gatekeepers and their computing capacity? And how much will they charge us for prioritised access to these resources? And we haven't really gotten to the wearable devices stage yet.

Also, everyone who requires these sophisticated models now needs to send everything to the gatekeepers. You could argue that we already send a lot of data to public clouds. However, there was no economically viable way for cloud vendors to read, interpret, and reuse my data — my intellectual property and private information. With more and more companies forcing AI capabilities on us, it's often unclear who runs those models and who receives the data and what is really happening to the data.

This aggregation of power and centralisation of data worries me as much as the shortcomings of LLMs. The technology is still not accurate enough. But we want it to be accurate because we are lazy. So I fear that we will end up with many things of diminished quality in favour of cheaper operating costs — time will tell.

replies(3): >>44478949 #>>44479025 #>>44479921 #
kgeist ◴[] No.44479025[source]
We run our own LLM server at the office for a month now, as an experiment (for privacy/infosec reasons), and a single RTX 5090 is enough to serve 50 people for occasional use. We run Qwen3 32b which in some benchmarks is equivalent to GPT 4.1-mini or Gemini 2.5 Flash. The GPU allows 2 concurrent requests at the same time with 32k context each and 60 tok/s. At first I was skeptical a single GPU would be enough, but it turns out, most people don't use LLMs 24/7.
replies(3): >>44479225 #>>44480111 #>>44480983 #
einrealist ◴[] No.44479225[source]
If those smaller models are sufficient for your use cases, go for it. But for how much longer will companies release smaller models for free? They invested so much. They have to recoup that money. Much will depend on investor pressure and the financial environment (tax deductions etc).

Open Source endeavors will have a hard time to bear the resources to train models that are competitive. Maybe we will see larger cooperatives, like a Apache Software Foundation for ML?

replies(7): >>44479267 #>>44479356 #>>44479513 #>>44479541 #>>44479835 #>>44479940 #>>44480209 #
1. msgodel ◴[] No.44479356[source]
Even Google and Facebook are releasing distills of their models (Gemma3 is very good, competitive with qwen3 if not better sometimes.)

There are a number of reasons to do this: You want local inference, you want attention from devs and potential users etc.

Also the smaller self hostable models are where most of the improvement happens these days. Eventually they'll catch up with where the big ones are today. At this point I honestly wouldn't worry too much about "gatekeepers."