(www.nytimes.com)

326 points jhunter1016 | 1 comments | 18 Oct 24 11:11 UTC | HN request time: 0.2s | source

Show context

mikeryan ◴[18 Oct 24 12:04 UTC] No.41878605[source]▶

While technical AI and LLMs are not something I’m well versed in. So as I sit on the sidelines and see the current proliferation of AI startups I’m starting to wonder where the moats are outside of access to raw computing power. Open AI seemed to have a massive lead in this space but that lead seems to be shrinking every day.

replies(10): >>41878784 #>>41878809 #>>41878843 #>>41880703 #>>41881606 #>>41882000 #>>41885618 #>>41886010 #>>41886133 #>>41887349 #

weberer ◴[18 Oct 24 12:28 UTC] No.41878784[source]▶

>>41878605 #

Obtaining high quality training data is the biggest moat right now.

replies(2): >>41882699 #>>41883992 #

segasaturn ◴[18 Oct 24 19:32 UTC] No.41882699[source]▶

>>41878784 #

Where are they going to get that data? Everything on the open web after 2023 is polluted with lowquality AI slop that poisons the data sets. My prediction: Aggressive dragnet surveillance of users. As in, Google recording your phone calls on Android, Windows sending screen recordings from Recall to OpenAI, Meta training off Whatsapp messages... It sounds dystopian, but the Line Must Go Up.

replies(3): >>41883095 #>>41883850 #>>41885531 #

lfmunoz4 ◴[19 Oct 24 03:53 UTC] No.41885531[source]▶

>>41882699 #

Would think most quality data is books and news articles and scientific journals. Not crap people are texting each other.

These companies will never admit it but AI is built on the back of piracy archives, easiest way and cheapest way to getting massive amounts of quality data.

replies(2): >>41885758 #>>41887361 #

1. mcmcmc ◴[19 Oct 24 05:04 UTC] No.41885758[source]▶

>>41885531 #

That entirely depends on what quality you’re going for. If the goal is to simulate passably human conversation, texts and dms are probably more desirable.

↑

Microsoft and OpenAI's close partnership shows signs of fraying