←back to thread

321 points jhunter1016 | 1 comments | | HN request time: 0s | source
Show context
mikeryan ◴[] No.41878605[source]
While technical AI and LLMs are not something I’m well versed in. So as I sit on the sidelines and see the current proliferation of AI startups I’m starting to wonder where the moats are outside of access to raw computing power. Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.
replies(10): >>41878784 #>>41878809 #>>41878843 #>>41880703 #>>41881606 #>>41882000 #>>41885618 #>>41886010 #>>41886133 #>>41887349 #
weberer ◴[] No.41878784[source]
Obtaining high quality training data is the biggest moat right now.
replies(2): >>41882699 #>>41883992 #
segasaturn ◴[] No.41882699[source]
Where are they going to get that data? Everything on the open web after 2023 is polluted with lowquality AI slop that poisons the data sets. My prediction: Aggressive dragnet surveillance of users. As in, Google recording your phone calls on Android, Windows sending screen recordings from Recall to OpenAI, Meta training off Whatsapp messages... It sounds dystopian, but the Line Must Go Up.
replies(3): >>41883095 #>>41883850 #>>41885531 #
lfmunoz4 ◴[] No.41885531[source]
Would think most quality data is books and news articles and scientific journals. Not crap people are texting each other.

These companies will never admit it but AI is built on the back of piracy archives, easiest way and cheapest way to getting massive amounts of quality data.

replies(2): >>41885758 #>>41887361 #
1. mcmcmc ◴[] No.41885758[source]
That entirely depends on what quality you’re going for. If the goal is to simulate passably human conversation, texts and dms are probably more desirable.