←back to thread

125 points akeck | 1 comments | | HN request time: 0.339s | source
Show context
fimdomeio ◴[] No.33581875[source]
I have a problem with this concept of "unethical sourced data". what does it really mean?

Isn't google search business built on "unethical sourced data", they keep a pirated copy of every website they encounter and feed it to their algorithms.

Isn't by definition human culture built on "unethical sourced data" remixed by the human brain? Example: imagine you are creating a punk band. You will use all your background knowledged of what punk is, the band's you like and maybe if you're creative an unexpected source of inspiration from things outside the world of punk? How's that essentially different from how stable diffusion works?

Another good example is the "Who let the dogs out" song. There's an article / podcast at https://99percentinvisible.org/episode/whomst-among-us-let-t... where they try to find the origin. At some point even the creators don't really know where the source of the inspiration came from but some of the sources are geographically close which seems to point to a common source. Some of the variations seem quite different, some are pretty close.

Overall I think this is just computers replacing some human capabilities, like machines in factories. You lose most of the poethics in the artistry of a human doing something by hand and gain the capability speed. Doing x per second instead of y per month. If you need the symbolism and the poethics of art you'll keep using a human. if you need to generate a thousand variations of an idea you'll use stable diffusion.

replies(2): >>33582047 #>>33597206 #
1. OctopusLupid ◴[] No.33597206[source]
> Isn't google search business built on "unethical sourced data", they keep a pirated copy of every website they encounter and feed it to their algorithms.

My current opinion is yes. See Fedsearch and the whole controversy around it recently. Some people don’t like their data being scraped or studied without their consent, even if you could technically visit it.

I enable noindex by default on my Mastodon instance.

Doing a personal experience now where I don’t use Google or any other search engines that are crawler based. I heavily use links I get from other people, bookmarks, portals, “a webpage full of cool links”, and browsing history.