←back to thread

688 points crescit_eundo | 1 comments | | HN request time: 0s | source
Show context
codeflo ◴[] No.42145710[source]
At this point, we have to assume anything that becomes a published benchmark is specifically targeted during training. That's not something specific to LLMs or OpenAI. Compiler companies have done the same thing for decades, specifically detecting common benchmark programs and inserting hand-crafted optimizations. Similarly, the shader compilers in GPU drivers have special cases for common games and benchmarks.
replies(3): >>42146244 #>>42146391 #>>42151266 #
darkerside ◴[] No.42146244[source]
VW got in a lot of trouble for this
replies(10): >>42146543 #>>42146550 #>>42146553 #>>42146556 #>>42146560 #>>42147093 #>>42147124 #>>42147353 #>>42147357 #>>42148300 #
sigmoid10 ◴[] No.42146560[source]
Apples and oranges. VW actually cheated on regulatory testing to bypass legal requirements. So to be comparable, the government would first need to pass laws where e.g. only compilers that pass a certain benchmark are allowed to be used for purchasable products and then the developers would need to manipulate behaviour during those benchmarks.
replies(3): >>42146749 #>>42147885 #>>42150309 #
1. rsynnott ◴[] No.42147885{3}[source]
There's a sliding scale of badness here. The emissions cheating (it wasn't just VW, incidentally; they were just the first uncovered. Fiat-Chrysler, Mercedes, GM and BMW were also caught doing it, with suspicions about others) was straight-up fraud.

It used to be common for graphics drivers to outright cheat on benchmarks (the actual image produced would not be the same as it would have been if a benchmark had not been detected); this was arguably, fraud.

It used to be common for mobile phone manufacturers to allow the SoC to operate in a thermal mode that was never available to real users when it detected a benchmark was being used. This is still, IMO, kinda fraud-y.

Optimisation for common benchmark cases where the thing still actually _works_, and where the optimisation is available to normal users where applicable, is less egregious, though, still, IMO, Not Great.