←back to thread

695 points crescit_eundo | 1 comments | | HN request time: 0.203s | source
Show context
codeflo ◴[] No.42145710[source]
At this point, we have to assume anything that becomes a published benchmark is specifically targeted during training. That's not something specific to LLMs or OpenAI. Compiler companies have done the same thing for decades, specifically detecting common benchmark programs and inserting hand-crafted optimizations. Similarly, the shader compilers in GPU drivers have special cases for common games and benchmarks.
replies(3): >>42146244 #>>42146391 #>>42151266 #
darkerside ◴[] No.42146244[source]
VW got in a lot of trouble for this
replies(10): >>42146543 #>>42146550 #>>42146553 #>>42146556 #>>42146560 #>>42147093 #>>42147124 #>>42147353 #>>42147357 #>>42148300 #
1. bluGill ◴[] No.42147124[source]
Most of the time these days compiler writers are not cheating like VW did. In the 1980s compiler writers would insert code to recognize performance tests and then cheat - output values hard coded into the compiler instead of running the algorithm. Which is the type of thing that VW got in trouble for.

These days most compilers are trying to make the general case of code fast and they rarely look for benchmarks. I won't say they never do this - just that it is much less common - if only because magazine reviews/benchmarks are not nearly as important as they used to be and so the incentive is gone.