Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.206s | source

Show context

codeflo ◴[15 Nov 24 10:52 UTC] No.42145710[source]▶

At this point, we have to assume anything that becomes a published benchmark is specifically targeted during training. That's not something specific to LLMs or OpenAI. Compiler companies have done the same thing for decades, specifically detecting common benchmark programs and inserting hand-crafted optimizations. Similarly, the shader compilers in GPU drivers have special cases for common games and benchmarks.

replies(3): >>42146244 #>>42146391 #>>42151266 #

darkerside ◴[15 Nov 24 12:21 UTC] No.42146244[source]▶

>>42145710 #

VW got in a lot of trouble for this

replies(10): >>42146543 #>>42146550 #>>42146553 #>>42146556 #>>42146560 #>>42147093 #>>42147124 #>>42147353 #>>42147357 #>>42148300 #

1. bluGill ◴[15 Nov 24 14:14 UTC] No.42147124[source]▶

>>42146244 #

Most of the time these days compiler writers are not cheating like VW did. In the 1980s compiler writers would insert code to recognize performance tests and then cheat - output values hard coded into the compiler instead of running the algorithm. Which is the type of thing that VW got in trouble for.

These days most compilers are trying to make the general case of code fast and they rarely look for benchmarks. I won't say they never do this - just that it is much less common - if only because magazine reviews/benchmarks are not nearly as important as they used to be and so the incentive is gone.

↑