I'm strictly speaking never going to think of model distillation as "stealing." It goes against the spirit of scientific research, and besides every tech company has lost my permission to define what I think of as theft forever
replies(3):
I think if OpenAI (or any other company) are paid for their compute time/access as anybody would, then using content generated by other models is fair game. Because it's an active/ongoing cost and not a passive one.
Whereas if someone trained on my dumb Tweets or HN posts then so be it; it's a passive cost for me - I paid my time to say x thing for my own benefits (tribal monk-e social interaction) therefore I have already gotten the value out of it.