Something weird is happening with LLMs and chess

(dynomight.substack.com)

696 points crescit_eundo | 1 comments | 14 Nov 24 17:05 UTC | HN request time: 0.209s | source

Show context

codeflo ◴[15 Nov 24 10:52 UTC] No.42145710[source]▶

At this point, we have to assume anything that becomes a published benchmark is specifically targeted during training. That's not something specific to LLMs or OpenAI. Compiler companies have done the same thing for decades, specifically detecting common benchmark programs and inserting hand-crafted optimizations. Similarly, the shader compilers in GPU drivers have special cases for common games and benchmarks.

replies(3): >>42146244 #>>42146391 #>>42151266 #

darkerside ◴[15 Nov 24 12:21 UTC] No.42146244[source]▶

>>42145710 #

VW got in a lot of trouble for this

replies(10): >>42146543 #>>42146550 #>>42146553 #>>42146556 #>>42146560 #>>42147093 #>>42147124 #>>42147353 #>>42147357 #>>42148300 #

1. close04 ◴[15 Nov 24 13:07 UTC] No.42146550[source]▶

>>42146244 #

Only because what VW did is illegal, was super large scale, and could be linked to a lot of indirect deaths through the additional pollution.

Benchmark optimizations are slightly embarrassing at worst, and an "optimization for a specific use case" at best. There's no regulation against optimizing for a particular task, everyone does it all the time, in some cases it's just not communicated transparently.

Phone manufacturers were caught "optimizing" for benchmarks again and again, removing power limits to boost scores. Hard to name an example without searching the net because it's at most a faux pas.

↑