LLaVA-O1: Let Vision Language Models Reason Step-by-Step

1. mptest ◴[18 Nov 24 20:09 UTC] No.42176357[source]▶

Has it been shown for o1 conclusively? I'd love to read the paper. I recall that Apple paper about non reasoning due to fuzzed question data causing performance degradation that caught a lot of traction but IIRC o1 had pretty resilient performance compared to previous models. which to be clear I agree with your sentiment towards. I just have yet to see definitive data that shows o1 is not fundamentally more resilient to the types of test we use to discern "reasoning" from "pattern matching".

I watched a professor lecture on the likely candidates for what the open source llm community think is going on in o1[0] and I'm not convinced it's still simple pattern matching. [0] https://youtu.be/6PEJ96k1kiw

2. SubiculumCode ◴[18 Nov 24 20:15 UTC] No.42176413[source]▶

>>42175961 (TP) #

Can you provide an example or link?

I'm not so confident that humans reason in a fundamentally different way than pattern matching. Perhaps paradigms focused on predicting the next token is too limiting. Reasoning plausibly involves pattern matching relevant schema representations, then executing along that schema. The ability to intuit that an existing schema is applicable to a certain situation is a good measure of intelligence, IMO. Could even make a good llm metric.

replies(1): >>42176906 #

3. blixt ◴[18 Nov 24 20:35 UTC] No.42176666[source]▶

>>42175961 (TP) #

I don't completely disagree but I believe it's a bit more fuzzy than that. From what I understand, the models learn a very compressed version of what they receive as input and produce as output. While not sufficient to generalize, you could say they memorize some very high-dimensional function to cause the expected text to be produced, and they can turn on and combine multiple of these functions (multiply by non-zero, sum, etc). So on some level an LLM can kind of perform logic on the input, even if it has a slightly novel pattern. But at the same time, no model is shown to completely generalize the way a human would.

And let's also be fair, it would take a lot of effort for a human to generalize to a previously unseen pattern as well, so I always wonder just how useful it is to try to make such binary statements as "models don't reason" or they're "stochastic parrots". But maybe it's to counterweigh the statements that they are sentient, AGI is here, etc?

4. mdp2021 ◴[18 Nov 24 20:55 UTC] No.42176906[source]▶

>>42176413 #

> humans reason in a fundamentally different way

After having formulated an idea, do you put it on your intellectual bench and re-examine it, purposefully, analytically? Well, that is more than plain pattern matching over intellectual keys - it is procedural.

And what about those intellectual keys or «schemas», how are they generated? Through a verification, consolidation that is further to the original (pattern matching) intuition.

replies(1): >>42178329 #

5. blovescoffee ◴[18 Nov 24 21:37 UTC] No.42177368[source]▶

>>42175961 (TP) #

You’re going to PSA an opinion?

6. stevenhuang ◴[18 Nov 24 23:25 UTC] No.42178329{3}[source]▶

>>42176906 #

> After having formulated an idea, do you put it on your intellectual bench and re-examine it, purposefully, analytically?

Can you show conclusively that LLMs can't do this or don't already do this to some degree?

replies(1): >>42178373 #

7. mdp2021 ◴[18 Nov 24 23:28 UTC] No.42178373{4}[source]▶

>>42178329 #

Not "anatomically": only from the results.

I have skimmed through another relevant piece today: it seems we are not proceeding with adequate pace with the interpretation of the internals, with the gained "transparency" of the architecture...

replies(1): >>42178573 #

8. stevenhuang ◴[18 Nov 24 23:51 UTC] No.42178573{5}[source]▶

>>42178373 #

Precisely. The architecture is transparent but the latent representations within and the operations performed by LLMs are not.

It's a subject of active research the extent LLM "reasoning" really is reasoning similar to humans, or something of a strictly weaker class entirely.

Personally I'm of the opinion human reasoning is really just "pattern matching", but we're also still waiting for the cognitive scientists to give us an answer on that one.

replies(1): >>42181987 #

9. mdp2021 ◴[19 Nov 24 10:40 UTC] No.42181987{6}[source]▶

>>42178573 #

> I'm of the opinion human reasoning is really just "pattern matching"

There are more interpretations of "pattern matching".

Of course it seems a fundamental component of generating ideas, but then those ideas are put - by intellectuals - on a bench and criticized actively. The two activities have important differences. First you look and go "they seem four", but then you count to be sure.

The second part is absolutely critical to determine a well working reasoner.