Something weird is happening with LLMs and chess

(dynomight.substack.com)

Show context

swiftcoder ◴[15 Nov 24 07:57 UTC] No.42144784[source]▶

I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.

replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #

scott_w ◴[15 Nov 24 11:10 UTC] No.42145811[source]▶

>>42144784 #

I suspect the same thing. Rather than LLMs “learning to play chess,” they “learnt” to recognise a chess game and hand over instructions to a chess engine. If that’s the case, I don’t feel impressed at all.

replies(5): >>42146086 #>>42146152 #>>42146383 #>>42146415 #>>42156785 #

1. gamerDude ◴[15 Nov 24 12:43 UTC] No.42146383[source]▶

>>42145811 #

This is exactly what I feel AI needs. A manager AI that then hands off things to specialized more deterministic algorithms/machines.

replies(4): >>42146397 #>>42147292 #>>42150449 #>>42152158 #

2. criley2 ◴[15 Nov 24 12:45 UTC] No.42146397[source]▶

>>42146383 (TP) #

Basically what Wolfram Alpha rolled out 15 years ago.

It was impressive then, too.

replies(1): >>42150365 #

3. spiderfarmer ◴[15 Nov 24 14:35 UTC] No.42147292[source]▶

>>42146383 (TP) #

Multi Agent LLM's are already a thing.

replies(1): >>42148751 #

4. nine_k ◴[15 Nov 24 17:09 UTC] No.42148751[source]▶

>>42147292 #

Somehow they're not in the limelight, and lack a well-known open-source runner implementation (like llama.cpp).

Given the potential, they should be winning hands down; where's that?

5. waffletower ◴[15 Nov 24 20:05 UTC] No.42150365[source]▶

>>42146397 #

It is good to see other people buttressing Stephen Wolfram's ego. It is extraordinarily heavy work and Stephen can't handle it all by himself.

6. waffletower ◴[15 Nov 24 20:16 UTC] No.42150449[source]▶

>>42146383 (TP) #

While deterministic components may be a left-brain default, there is no reason that such delegate services couldn't be more specialized ANN models themselves. It would most likely vastly improve performance if they were evaluated in the same memory space using tensor connectivity. In the specific case of chess, it is helpful to remember that AlphaZero utilizes ANNs as well.

7. bigiain ◴[15 Nov 24 22:48 UTC] No.42152158[source]▶

>>42146383 (TP) #

Next thing, the "manager AIs" start stack ranking the specialized "worker AIs".

And the worker AIs "evolve" to meet/exceed expectations only on tasks directly contributing to KPIs the manager AIs measure for - via the mechanism of discarding the "less fit to exceed KPIs".

And some of the worker AIs who're trained on recent/polluted internet happen to spit out prompt injection attacks that work against the manager AIs rank stacking metrics and dominate over "less fit" worker AIs. (Congratulations, we've evolved AI cancer!) These manager AIs start performing spectacularly badly compared to other non-cancerous manager AIs, and die or get killed off by the VC's paying for their datacenters.

Competing manager AIs get training, perhaps on on newer HN posts discussing this emergent behavior of worker AIs, and start to down rank any exceptionally performing worker AIs. The overall trends towards mediocrity becomes inevitable.

Some greybread writes some Perl and regexes that outcompete commercial manager AIs on pretty much every real world task, while running on a 10 year old laptop instead of a cluster of nuclear powered AI datacenters all consuming a city's worth of fresh drinking water.

Nobody in powerful positions care. Humanity dies.

replies(1): >>42209924 #

8. MyFirstSass ◴[21 Nov 24 23:57 UTC] No.42209924[source]▶

>>42152158 #

And “comment of the year” award goes to.

Sorry for the filler but this is amazingly put and so true.

We’ll get so many unintended consequences that are opposite any worthy goals when it’s AIs talking to AIs in a few years.

↑