←back to thread

207 points todsacerdoti | 5 comments | | HN request time: 0.001s | source
Show context
badsectoracula ◴[] No.46004007[source]
A related test i did around the beginning of the year: i came up with a simple stack-oriented language and asked an LLM to solve a simple problem (calculate the squared distance between two points, the coordinates of which are already in the stack) and had it figure out the details.

The part i found neat was that i used a local LLM (some quantized version of QwQ from around December or so i think) that had a thinking mode so i was able to follow the thought process. Since it was running locally (and it wasn't a MoE model) it was slow enough for me to follow it in realtime and i found fun watching the LLM trying to understand the language.

One other interesting part is the language description had a mistake but the LLM managed to figure things out anyway.

Here is the transcript, including a simple C interpreter for the language and a test for it at the end with the code the LLM produced:

https://app.filen.io/#/d/28cb8e0d-627a-405f-b836-489e4682822...

replies(2): >>46004536 #>>46007752 #
int_19h ◴[] No.46007752[source]
I often wonder how people can look at a log like this and still confidently state that this isn't reasoning.
replies(2): >>46008780 #>>46008824 #
garciasn ◴[] No.46008824[source]
Depends on the definition of reasoning:

1) think, understand, and form judgments by a process of logic.

—- LLMs do not think, nor do they understand; they also cannot form ‘judgments’ in any human-relatable way. They’re just providing results in the most statistically relevant way their training data permits.

2) find an answer to a problem by considering various possible solutions

—- LLMs can provide a result that may be an answer after providing various results that must be verified as accurate by a human, but they don’t do this in any human-relatable way either.

—-

So; while LLMs continue to be amazing mimics, thus they APPEAR to be great at ‘reasoning’, they aren’t doing anything of the sort, today.

replies(1): >>46008907 #
1. CamperBob2 ◴[] No.46008907[source]
Exposure to our language is sufficient to teach the model how to form human-relatable judgements. The ability to execute tool calls and evaluate the results takes care of the rest. It's reasoning.
replies(1): >>46009054 #
2. garciasn ◴[] No.46009054[source]
SELECT next_word, likelihood_stat FROM context ORDER BY 2 DESC LIMIT 1

is not reasoning; it just appears that way due to Clarke’s third law.

replies(2): >>46009226 #>>46009493 #
3. CamperBob2 ◴[] No.46009226[source]
(Shrug) You've already had to move your goalposts to the far corner of the parking garage down the street from the stadium. Argument from ignorance won't help.
4. int_19h ◴[] No.46009493[source]
Sure, at the end of the day it selects the most probable token - but it has to compute the token probabilities first, and that's the part where it's hard to see how it could possibly produce a meaningful log like this without some form of reasoning (and a world model to base that reasoning on).

So, no, this doesn't actually answer the question in a meaningful way.

replies(1): >>46009744 #
5. ◴[] No.46009744{3}[source]