(www.theguardian.com)

283 points Brajeshwar | 2 comments | 13 Sep 25 11:30 UTC | HN request time: 0.522s | source

Show context

iandanforth ◴[13 Sep 25 12:41 UTC] No.45231600[source]▶

"Google said in a statement: “Quality raters are employed by our suppliers and are temporarily assigned to provide external feedback on our products. Their ratings are one of many aggregated data points that help us measure how well our systems are working, but do not directly impact our algorithms or models.” GlobalLogic declined to comment for this story." (emphasis mine)

How is this not a straight up lie? For this to be true they would have to throw away labeled training data.

replies(4): >>45231651 #>>45231697 #>>45231758 #>>45232359 #

1. Gracana ◴[13 Sep 25 12:48 UTC] No.45231651[source]▶

>>45231600 #

They probably don’t do it at a scale large enough to do RLHF with it, but it’s still useful feedback the people working on the projects / products.

replies(1): >>45231708 #

2. zozbot234 ◴[13 Sep 25 12:55 UTC] No.45231708[source]▶

>>45231651 (TP) #

More recent models actually use "reinforcement learning from AI feedback", where the task of assigning a reward is essentially fed back into the model itself. Human feedback is then only used to ground the training, on selected examples (potentially even entirely artificial ones) where the AI is most highly uncertain about what feedback should be given.

↑

‘Overworked, underpaid’ humans train Google’s AI