←back to thread

688 points crescit_eundo | 1 comments | | HN request time: 0.001s | source
Show context
codeflo ◴[] No.42145710[source]
At this point, we have to assume anything that becomes a published benchmark is specifically targeted during training. That's not something specific to LLMs or OpenAI. Compiler companies have done the same thing for decades, specifically detecting common benchmark programs and inserting hand-crafted optimizations. Similarly, the shader compilers in GPU drivers have special cases for common games and benchmarks.
replies(3): >>42146244 #>>42146391 #>>42151266 #
darkerside ◴[] No.42146244[source]
VW got in a lot of trouble for this
replies(10): >>42146543 #>>42146550 #>>42146553 #>>42146556 #>>42146560 #>>42147093 #>>42147124 #>>42147353 #>>42147357 #>>42148300 #
sigmoid10 ◴[] No.42146560[source]
Apples and oranges. VW actually cheated on regulatory testing to bypass legal requirements. So to be comparable, the government would first need to pass laws where e.g. only compilers that pass a certain benchmark are allowed to be used for purchasable products and then the developers would need to manipulate behaviour during those benchmarks.
replies(3): >>42146749 #>>42147885 #>>42150309 #
0xFF0123 ◴[] No.42146749[source]
The only difference is the legality. From an integrity point of view it's basically the same
replies(7): >>42146884 #>>42146984 #>>42147072 #>>42147078 #>>42147443 #>>42147742 #>>42147978 #
1. UniverseHacker ◴[] No.42146984[source]
I disagree- presumably if an algorithm or hardware is optimized for a certain class of problem it really is good at it and always will be- which is still useful if you are actually using it for that. It’s just “studying for the test”- something I would expect to happen even if it is a bit misleading.

VW cheated such that the low emissions were only active during the test- it’s not that it was optimized for low emissions under the conditions they test for, but that you could not get those low emissions under any conditions in the real world. That's "cheating on the test" not "studying for the test."