←back to thread

439 points diggan | 9 comments | | HN request time: 0.511s | source | bottom
1. ratg13 ◴[] No.45063367[source]
I can understand training AIs on books, and even internet forums, but I can't help but think that training an AI on lots of dumb questions with probably an excessive amount of grammar and spelling errors will somehow make it smarter.
replies(4): >>45063503 #>>45063715 #>>45064070 #>>45068370 #
2. dahsameer ◴[] No.45063503[source]
> and even internet forums

i would consider internet forums also includes a lot of dumb questions

replies(2): >>45063666 #>>45066596 #
3. ratg13 ◴[] No.45063666[source]
Agree, but people generally take a small pause before saying stuff online.

In 'private', people are less ashamed of their ignorance, and also know they can say gibberish and the AI will figure it out.

4. nrclark ◴[] No.45063715[source]
Depends on how you’re using the data. There’s a pretty strong correctness signal in the user behavior.

Did they rephrase the question? Probably the first answer was wrong. Did the session end? Good chance the answer was acceptable. Did they ask follow-ups? What kind? Etc.

replies(2): >>45064137 #>>45064409 #
5. mrweasel ◴[] No.45064070[source]
They train AI on Reddit and Stack Overflow questions, I can't see it getting any worse.
6. dudefeliciano ◴[] No.45064137[source]
> Did the session end? Good chance the answer was acceptable.

Or that the user just ragequit

7. vb-8448 ◴[] No.45064409[source]
I'm used to doing the same task 4 or 5 times (different sessions, similar prompts), and most of the time the result is useless or completely wrong. Sometimes I go back and pick the first result, other time none of them, other time a mix of them. I'm wondering how can they extract value from this.
8. timeon ◴[] No.45066596[source]
Like what?
9. victorbjorklund ◴[] No.45068370[source]
Doubt they feed everything in. They probably pick out a small subset of conversations for the training round.