I have a really hard time thinking that Google, Microsoft, Meta, etc... would _not_ train on whatever people enter (willingly or not in the system.)
The silver lining is that what most people enter in a chat box is _utter crap_.
So, training on that would make the "Artificial Intelligence" system less and less intelligent - unless the devs find a way to automagically sort clever things from stupid things, in which case I want to buy _that_ product.
In the long run, LLMs dev are going to have to either:
* refrain from getting high on their own supply, and find a way to tag AI generated content
* or sort the bs from the truth, probably reinventing "trust in gatekeepers and favoring sources of truth with a track record" and copying social pressure, etc... until we have a "pulitzer price" and "academy awards" for most relevant AI sources with a higher sticker price, to separate from cheap slop.
That, or "2+2=7 because DeepChatGrokmini said so, and if you don't agree you're a terrorist, and if our AI math breaks your rocket it's your fault."