←back to thread

56 points trott | 2 comments | | HN request time: 0s | source
Show context
throwaway48476 ◴[] No.40714342[source]
Were not even close to running out of human generated data. The reason it seems this way is because it's so hard to find old data. There are tons of whole magazine scans on some obscure website that's not even indexed. Most of this is the fault of Google who has been an atrocious steward of search. Why is it that I still can't do full text search of the internet archive dataset? Forever copyright of commercially de minimus works also plays a large role.

There's a monumental amount of quality data out there that's not indexed, not searchable, and abandoned but unused. We just need to value it enough to use it.

replies(1): >>40714649 #
stubish ◴[] No.40714649[source]
So much old, out of date, factually incorrect, racist, sexist and even illegal information. I think it is already clear that training systems on everything is not the way forward, and about as reliable as the set of 80s Encyclopedias my mother refuses to throw out. The current tech needs to be trained on good data to produce good results, as it can't reason and gauge reliability or even pick up when its output is self contradictory.
replies(3): >>40714750 #>>40716624 #>>40720349 #
1. glimshe ◴[] No.40716624[source]
The geniuses and stewards of our civilization of just a couple of decades ago were trained on this very data. We don't yet know what outcome we'll get by handing out the world to the people trained on "new, up to date, factually correct, egalitarian and legal" data.
replies(1): >>40723207 #
2. stubish ◴[] No.40723207[source]
We hope they used their reason to maintain their knowledge over the years, or at least updated their poor fashion choices. Or maybe not given so much effort is made to enforce moral opinions from biblical times.