←back to thread

747 points porridgeraisin | 2 comments | | HN request time: 0.486s | source
Show context
JCM9 ◴[] No.45063064[source]
Not a surprise. All the major players have reached the limits of training on existing data—they’re already training on essentially the whole internet plus a bunch of content they allegedly stole (hence various lawsuits). There haven’t been any major breakthroughs in model architecture from the major players recently and thus they’re now in a battle for more data to train on. They need data, and they want YOUR data, now, and are gonna do increasingly shady things to get it.
replies(5): >>45063645 #>>45063676 #>>45063696 #>>45064759 #>>45064804 #
cube00 ◴[] No.45063676[source]
It's nice to see the newer models are suffering after being exposed to training on their own slop.

If they had done this in a more measured way they might have been able to separate human from AI content such as doing legal deals with publishers.

However they couldn't wait to just take it all to be first and now the well is poisoned for everyone.

replies(1): >>45064214 #
theshackleford ◴[] No.45064214[source]
> It's nice to see the newer models are suffering after being exposed to training on their own slop.

I've seen zero evidence anything of the such is occurring, and that if it was, it's due to what you claim. I'd be highly interested in research suggesting both or either is occurring however.

replies(1): >>45067192 #
cube00 ◴[] No.45067192[source]
"AI models collapse when trained on recursively generated data"

https://news.ycombinator.com/item?id=41058194

replies(1): >>45070408 #
1. theshackleford ◴[] No.45070408[source]
That's not what I asked for as it's not relevant.

The claim was made that the models are "suffering", at this exact moment, because they have been recursively feeding themselves, RIGHT now.

I want evidence the current models are "suffering" right now, and I want further evidence that suggests this suffering is due to recursive data ingestion.

Some year old article with no relevance to today talking about hypotheticals of indiscriminate gorging of recursive data is not evidence of either of the things I asked for.

replies(1): >>45082311 #
2. cube00 ◴[] No.45082311[source]
Did you mean the current models that are still stuck in 2023?

> what's the latest year of data you're trained on

> ChatGPT said: My training goes up to April 2023.

There's a reason they're not willing to update the training corpus even with GPT-5.

> Some year old article with no relevance to today

The current models are based on training even older so I guess you should disregard those too if you're choosing to judge things purely based on their age.