←back to thread

688 points crescit_eundo | 7 comments | | HN request time: 0.951s | source | bottom
Show context
swiftcoder ◴[] No.42144784[source]
I feel like the article neglects one obvious possibility: that OpenAI decided that chess was a benchmark worth "winning", special-cases chess within gpt-3.5-turbo-instruct, and then neglected to add that special-case to follow-up models since it wasn't generating sustained press coverage.
replies(8): >>42145306 #>>42145352 #>>42145619 #>>42145811 #>>42145883 #>>42146777 #>>42148148 #>>42151081 #
scott_w ◴[] No.42145811[source]
I suspect the same thing. Rather than LLMs “learning to play chess,” they “learnt” to recognise a chess game and hand over instructions to a chess engine. If that’s the case, I don’t feel impressed at all.
replies(5): >>42146086 #>>42146152 #>>42146383 #>>42146415 #>>42156785 #
1. gamerDude ◴[] No.42146383[source]
This is exactly what I feel AI needs. A manager AI that then hands off things to specialized more deterministic algorithms/machines.
replies(4): >>42146397 #>>42147292 #>>42150449 #>>42152158 #
2. criley2 ◴[] No.42146397[source]
Basically what Wolfram Alpha rolled out 15 years ago.

It was impressive then, too.

replies(1): >>42150365 #
3. spiderfarmer ◴[] No.42147292[source]
Multi Agent LLM's are already a thing.
replies(1): >>42148751 #
4. nine_k ◴[] No.42148751[source]
Somehow they're not in the limelight, and lack a well-known open-source runner implementation (like llama.cpp).

Given the potential, they should be winning hands down; where's that?

5. waffletower ◴[] No.42150365[source]
It is good to see other people buttressing Stephen Wolfram's ego. It is extraordinarily heavy work and Stephen can't handle it all by himself.
6. waffletower ◴[] No.42150449[source]
While deterministic components may be a left-brain default, there is no reason that such delegate services couldn't be more specialized ANN models themselves. It would most likely vastly improve performance if they were evaluated in the same memory space using tensor connectivity. In the specific case of chess, it is helpful to remember that AlphaZero utilizes ANNs as well.
7. bigiain ◴[] No.42152158[source]
Next thing, the "manager AIs" start stack ranking the specialized "worker AIs".

And the worker AIs "evolve" to meet/exceed expectations only on tasks directly contributing to KPIs the manager AIs measure for - via the mechanism of discarding the "less fit to exceed KPIs".

And some of the worker AIs who're trained on recent/polluted internet happen to spit out prompt injection attacks that work against the manager AIs rank stacking metrics and dominate over "less fit" worker AIs. (Congratulations, we've evolved AI cancer!) These manager AIs start performing spectacularly badly compared to other non-cancerous manager AIs, and die or get killed off by the VC's paying for their datacenters.

Competing manager AIs get training, perhaps on on newer HN posts discussing this emergent behavior of worker AIs, and start to down rank any exceptionally performing worker AIs. The overall trends towards mediocrity becomes inevitable.

Some greybread writes some Perl and regexes that outcompete commercial manager AIs on pretty much every real world task, while running on a 10 year old laptop instead of a cluster of nuclear powered AI datacenters all consuming a city's worth of fresh drinking water.

Nobody in powerful positions care. Humanity dies.