←back to thread

747 points porridgeraisin | 1 comments | | HN request time: 0.209s | source
1. phtrivier ◴[] No.45063388[source]
Is there a summary of the stance on training with user data for the main llms ?

I have a really hard time thinking that Google, Microsoft, Meta, etc... would _not_ train on whatever people enter (willingly or not in the system.)

The silver lining is that what most people enter in a chat box is _utter crap_.

So, training on that would make the "Artificial Intelligence" system less and less intelligent - unless the devs find a way to automagically sort clever things from stupid things, in which case I want to buy _that_ product.

In the long run, LLMs dev are going to have to either:

* refrain from getting high on their own supply, and find a way to tag AI generated content

* or sort the bs from the truth, probably reinventing "trust in gatekeepers and favoring sources of truth with a track record" and copying social pressure, etc... until we have a "pulitzer price" and "academy awards" for most relevant AI sources with a higher sticker price, to separate from cheap slop.

That, or "2+2=7 because DeepChatGrokmini said so, and if you don't agree you're a terrorist, and if our AI math breaks your rocket it's your fault."