(www.ty-penguin.org.uk)

258 points todsacerdoti | 3 comments | 11 Jul 25 22:57 UTC | HN request time: 1.775s | source

Show context

lblume ◴[11 Jul 25 23:43 UTC] No.44537949[source]▶

Given that current LLMs do not consistently output total garbage, and can be used as judges in a fairly efficient way, I highly doubt this could even in theory have any impact on the capabilities of future models. Once (a) models are capable enough to distinguish between semi-plausible garbage and possibly relevant text and (b) companies are aware of the problem, I do not think data poisoning will be an issue at all.

replies(2): >>44538187 #>>44538470 #

1. immibis ◴[12 Jul 25 01:20 UTC] No.44538470[source]▶

>>44537949 #

There's no evidence that the current global DDoS is related to AI.

replies(2): >>44540177 #>>44540294 #

2. lblume ◴[12 Jul 25 07:55 UTC] No.44540177[source]▶

>>44538470 (TP) #

The linked page claims that most identified crawlers are related to scraping for training data of LLMs, which seems likely.

3. ykonstant ◴[12 Jul 25 08:26 UTC] No.44540294[source]▶

>>44538470 (TP) #

We have investigated nobody and found no evidence of malpractice!

↑

Faking a JPEG