←back to thread

321 points jhunter1016 | 4 comments | | HN request time: 0.625s | source
Show context
mikeryan ◴[] No.41878605[source]
While technical AI and LLMs are not something I’m well versed in. So as I sit on the sidelines and see the current proliferation of AI startups I’m starting to wonder where the moats are outside of access to raw computing power. Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.
replies(10): >>41878784 #>>41878809 #>>41878843 #>>41880703 #>>41881606 #>>41882000 #>>41885618 #>>41886010 #>>41886133 #>>41887349 #
1. InkCanon ◴[] No.41878843[source]
You hit the nail on the head. Companies are scrambling for an edge. Not a real edge, an edge to convince investors to keep giving them money. Perplexity is going all in on convincing VCs it can create a "data flywheel".
replies(1): >>41884196 #
2. disqard ◴[] No.41884196[source]
Perhaps I've missed something, but where will the infinite amounts of training data come from, for future improvements?

If these models will be trained on the outputs of themselves (and other models), then it's not so much a "flywheel", as it is a Perpetual Motion Machine.

replies(2): >>41885368 #>>41886361 #
3. Tier3r ◴[] No.41885368[source]
Perplexity has a dubious idea based around harvesting user chats -> making service better -> getting more user prompts. I am quite unconvinced that user prompts and stored chats will materially improve an LLM that is trained on a trillion high quality tokens.

The second idea being kicked around is synthetic data will create a new fountain of youth for data that will also fix its reasoning abilities.

4. LarsDu88 ◴[] No.41886361[source]
There's pretraining which is just raw text from the internet but there's also supervised preference data sourced from users.

Right now the edge is in acquiring the latter which OpenAI has a slight lead in