←back to thread

125 points akeck | 2 comments | | HN request time: 0.424s | source
Show context
ta8645 ◴[] No.33580501[source]
Artists are no different than all the people who tried to destroy the cotton gin or the automated loom. We're all going to have to live in a world where these technologies exist, and find a way to live a fulfilling life regardless. Just as chess players today enjoy the game even though computers have surpassed our chess abilities.

It seems odd to complain that computers are using human's artwork to inspire their own creations. Every human artist has done the exact same thing in their lifetime; it's unavoidable.

replies(10): >>33580588 #>>33580624 #>>33580644 #>>33580673 #>>33580687 #>>33580701 #>>33580722 #>>33580832 #>>33580867 #>>33582176 #
schroeding ◴[] No.33580867[source]
> Just as chess players today enjoy the game even though computers have surpassed our chess abilities.

The "product" that chess players produce is not replaceable by ML systems. The game itself, the "fight" of two minds (or one mind against the machine, in the past) is the "product". Watching two chess AIs play against each other can't replace that.

For artists, the product is their output, the art itself. An approximation of that art can also be produced by a ML system now, making artists an unnecessary cost factor[1] for e.g. simple illustrations.

They are not comparable, IMO. Chess players are not replaced by ML systems, artists will be.

> it's unavoidable.

It really isn't. Of course it would be possible to just outlaw the use of things like "the pile", which includes gigabytes of random texts with unknown copyright status. The same goes for any training set that uses images scraped of the web, ignoring any copyright.

Yes, people would still do it, but it would have the same status that piracy has. You can't build a US multi-billion dollar company on piracy (for long), and you wouldn't be able to do so with ML systems that were trained on random stuff from the internet.

I don't think this, in such broad strokes, would be a good thing, to be clear. Such datasets are great for research! But I have a really hard time understanding this defeatism that there is "nothing we can do".

[1] from the perspective of some customers e.g. magazines or ad companies - I don't agree with this

replies(2): >>33580921 #>>33581903 #
jjcon ◴[] No.33580921[source]
There is so much art that is creative commons and public domain I’m sure a worse ‘pile’ could be conjured up to start things out. Then just as we have seen with other architectures, as they refine, their need for data can drop and eventually we are back in the same place, maybe a few more years removed but back in the same place nonetheless. That is my take at least.

Personally, I don’t think it is likely that copyright laws will change to protect against algorithmic usage (too much precedent in more general reuse cases and for what is considered transformative). Having said that I also don’t think this will be the death of artists by any stretch, some industries will need to change or evolve but it will be just another tool in an artists belt IMO.

replies(1): >>33582246 #
schroeding ◴[] No.33582246[source]
True, but make even one classification mistake (and people upload stuff they don't own with the wrong license all the time) and you have to retrain your whole system for each mistake you make, as people trickle in and want their (wrongly classified as CC or public domain) stuff removed from your dataset.

It would chill the whole ML space significantly for decades, IMO, as the only truly safe data would be synthetic or licensed. This can work for some applications (e.g. Microsoft used synthetic data for facial landmark recognition[1]), but it would kill DALL-E 2 et al.

[1] https://microsoft.github.io/DenseLandmarks/

replies(1): >>33586104 #
jjcon ◴[] No.33586104[source]
If I use photoshop to recreate a copywritten work - they don’t have to redistribute photoshop or change it in any way. The originals are not being shipped in the models but the models are capable of recreating copywritten work. These are tools just like photoshop.
replies(1): >>33589050 #
1. heavyset_go ◴[] No.33589050[source]
Neural networks can and do encode data from their training sets in the models itself. That's the reason you can make some models reproduce things like the Getty watermark in the images they produce.
replies(1): >>33590023 #
2. jjcon ◴[] No.33590023[source]
Again not directly though and that is all that matters - I can reproduce the getty watermark in photoshop but that doesn’t make adobe liable. The fact that a tool is capable of copyright infringment does not shift the legal burden anywhere - it is totally beside the point. Technically photoshop’s ‘content aware fill’ could fill in missing regions with copywritten content purely by chance but the burden is still on me if I publish that content, not on adobe. Legally speaking these are tools just like any other algorithm or machine out there, their sophistication and particular method is not particularly relevant (again legally speaking).