Most active commenters

anon291(8)
vlovich123(4)

Extending the context length to 1M tokens

(qwenlm.github.io)

Show context

anon291 ◴[18 Nov 24 17:49 UTC] No.42174879[source]▶

Can we all agree that these models far surpass human intelligence now? I mean they process hours worth of audio in less time than it would take a human to even listen. I think the singularity passed and we didn't even notice (which would be expected)

replies(11): >>42174949 #>>42174987 #>>42175002 #>>42175008 #>>42175019 #>>42175095 #>>42175118 #>>42175171 #>>42175223 #>>42175324 #>>42176838 #

1. Workaccount2 ◴[18 Nov 24 18:00 UTC] No.42175019[source]▶

>>42174879 #

They process the audio but they stumble enough with recall that you cannot really trust it.

I had a problem where I used GPT-4o to help me with inventory management, something a 5th grade kid could handle, and it kept screwing up values for a list of ~50 components. I ended up spending more time trying to get it to properly parse the input audio (I read off the counts as I moved through inventory bins) then if I had just done it manually.

On the other hand, I have had good success with having it write simple programs and apps. So YMMV quite a lot more than with a regular person.

replies(3): >>42175047 #>>42175104 #>>42175157 #

2. anon291 ◴[18 Nov 24 18:03 UTC] No.42175047[source]▶

>>42175019 (TP) #

> They process the audio but they stumble enough with recall that you cannot really trust it.

I will wave my arms wildly at the last eight years if the claim is that humans do not struggle with recall.

replies(2): >>42175077 #>>42175179 #

3. th0ma5 ◴[18 Nov 24 18:06 UTC] No.42175077[source]▶

>>42175047 #

So are they human like and therefore not anything special or are they super human magic? I never get the equivocation when people complain how there is no way to objectively tell what out is right or wrong people either say they are getting better, or they work for me, or that people are just as bad. No they aren't! Not in the same way these things are bad.

replies(1): >>42175320 #

4. wahnfrieden ◴[18 Nov 24 18:09 UTC] No.42175104[source]▶

>>42175019 (TP) #

You must use it to make transcripts and then write code to process the values in the transcripts

5. XenophileJKO ◴[18 Nov 24 18:13 UTC] No.42175157[source]▶

>>42175019 (TP) #

Likely the issue is how you are asking the model to process things. The primary limitation is the amount of information (or really attention) they can keep in flight at any given moment.

This generally means for a task like you are doing, you need to have sign posts in the data like minute markers or something that it can process serially.

This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

replies(1): >>42175959 #

6. vlovich123 ◴[18 Nov 24 18:14 UTC] No.42175179[source]▶

>>42175047 #

I will wave my arms wildly if the claim is that LLM struggle with recall is similar to human-like struggle with recall. And since that's how we decide on truth, I win?

replies(1): >>42175343 #

7. anon291 ◴[18 Nov 24 18:24 UTC] No.42175320{3}[source]▶

>>42175077 #

Most people will confidently recount whatever narrative matches their current actions. This is called rationalization, and most people engage in it daily.

8. anon291 ◴[18 Nov 24 18:26 UTC] No.42175343{3}[source]▶

>>42175179 #

what we call hallucination in LLMs is called 'rationalization' for humans. The psychology shows that most peoples do things out of habit and only after they've done it will explain why the did it. This is most obviously seen in split brain patients where the visual fields are then separated. If you throw a ball towards the left side of the person, the right brain will catch the ball. if you then ask the person why they caught the ball the left brain will make up a completely ridiculous narrative as to why the hand moved (because it didn't know there is a ball. This is a contrived example, but it shows that human recollection of intent is often very very wrong. There are studies that show this even in people with whole brains.

replies(1): >>42175460 #

9. vlovich123 ◴[18 Nov 24 18:39 UTC] No.42175460{4}[source]▶

>>42175343 #

You're unfortunately completely missing the point. I didn't say that human recall is perfect or that they don't rationalize. And of course you can have extreme denial of what's happening in front of you even in healthy individuals. In fact, you see this in this thread where either you or the huge number of people trying to dissillusion you from the maximal position you've staked out on LLMs is wrong and one of us is incorrectly rationalizing our position.

The point is that the ways in which it fails is completely different from LLMs and it's different between people whereas the failure modes for LLMs are all fairly identical regardless of the model. Go ask an LLM to draw you a wine glass filled to the brim and it'll keep insisting it does even though it keeps drawing one half-filled and agree that the one it drew doesn't have the characteristics it says such a drawing would need and still output the exact same drawing. Most people would not fail at the task in that way.

replies(1): >>42175854 #

10. anon291 ◴[18 Nov 24 19:17 UTC] No.42175854{5}[source]▶

>>42175460 #

> In fact, you see this in this thread where either you or the huge number of people trying to dissillusion you from the maximal position you've staked out on LLMs is wrong and one of us is incorrectly rationalizing our position.

I by no means have a 'maximal' position. I have said that they exceed the intelligence and ability of the vast majority of the human populace when it comes to their singular sense and action (ingesting language and outputting language). I fully stand by that, because it's true. I've not claimed that they exceed everyone's intelligence in every area. However, their ability to synthesize wildly different fields is well beyond most human's ability. Yes, I do believe we've crossed the tipping point. As it is, these things are not noticeable except in retrospect.

> The point is that the ways in which it fails is completely different from LLMs and it's different between people whereas the failure modes for LLMs are all fairly identical

I disagree with the idea that human failure modes are different between people. I think this is the result of not thinking at a high enough level. Human failure modes are often very similar. Drama authors make a living off exploring human failure modes, and there's a reason why they say there are no new stories.

I agree that Human and LLM failure modes are different, but that's to be expected.

> regardless of the model

As far as I'm aware, all LLMs in common use today use a variant of the transformer. Transformers have much different pitfalls compared to RNNs (RNNs are parlticularly bad at recall for example).

> Go ask an LLM to draw you a wine glass filled to the brim and it'll keep insisting it does even though it keeps drawing one half-filled and agree that the one it drew doesn't have the characteristics it says such a drawing would need and still output the exact same drawing. Most people would not fail at the task in that way.

Most people can't draw very well anyway, so this is just proving my point.

replies(1): >>42176008 #

11. anon291 ◴[18 Nov 24 19:27 UTC] No.42175959[source]▶

>>42175157 #

> This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

Ranking / sorting is O(n log n) no matter what. Given that a transformer runs in constant time before we 'force' it to output an answer, there must be an M such that beyond that length it cannot reliably sort a list. This MUST be the case and can only be solved by running the model some indeterminate number of times, but I don't believe we currently have any architecture to do that.

Note that humans have the same limitation. If you give humans a time limit, there is a maximum number of things they will be able to sort reliably in that time.

replies(1): >>42177723 #

12. vlovich123 ◴[18 Nov 24 19:31 UTC] No.42176008{6}[source]▶

>>42175854 #

> Most people can't draw very well anyway, so this is just proving my point.

And you're proving my point. The ways in which the people would fail to draw the wine glass are different from the LLM. The vast majority of people would fail to reproduce a photorealistic simile. But the vast majority of people would meet the requirement of drawing it filled to the brim. The LLMs absolutely succeed at the quality of the drawing but absolutely fail at meeting human specifications and expectations. Generously, you can say it's a different kind of intelligence. But saying it's more intelligent than humans requires you to use a drastically different axis akin to the one you'd use saying that computers are smarter than humans because they can add two numbers more quickly.

replies(1): >>42176133 #

13. anon291 ◴[18 Nov 24 19:44 UTC] No.42176133{7}[source]▶

>>42176008 #

At no point did I say humans and LLMs have the same failure modes.

> But the vast majority of people would meet the requirement of drawing it filled to the brim.

But both are failures, right? It's just a cognitive bias that we don't expect artistic ability of most people.

> But saying it's more intelligent than humans requires you to use a drastically different axis

I'm not going to rehash this here, but as I said elsewhere in this thread, intelligences are different. There's no one metric, but for many common human tasks, the ability of the LLMs surpasses humans.

> saying that computers are smarter than humans because they can add two numbers more quickly.

This is where I disagree. Unlike a traditional program, both humans and LLMs can take unstructured input and instruction. Yes, they can both fail and they fail differently (or succeed in different ways), but there is a wide gulf between the sort of structured computation a traditional program does and an llm.

replies(1): >>42180253 #

14. christianqchung ◴[18 Nov 24 22:17 UTC] No.42177723{3}[source]▶

>>42175959 #

Transformers absolutely do not run in constant time by any reasonable definition, no matter what your point is.

replies(1): >>42177845 #

15. anon291 ◴[18 Nov 24 22:35 UTC] No.42177845{4}[source]▶

>>42177723 #

They absolutely do given a sequence size. All models have max context lengths. Thus bounded by a constant

16. vlovich123 ◴[19 Nov 24 05:11 UTC] No.42180253{8}[source]▶

>>42176133 #

> But both are failures, right? It's just a cognitive bias that we don't expect artistic ability of most people

No, I'd say very different failures. The LLM is failing at reasoning and understanding whereas people are failing at training. Humans can fix the training part by simply doing the task repetitively. LLMs can't fix the understanding part because it's a fundamental flaw in the design. It's like categorizing a chimp's inability to understand logical reasoning as "cognitive bias" - no it's a much more structural problem.

> intelligences are different. There's no one metric, but for many common human tasks, the ability of the LLMs surpasses humans

There isn't one metric, and yes LLMs surpass humans on various tasks. But we've not been able to establish any evidence that the mechanism that they operate by is intelligence. It's certainly the closest we've come to building something artificial that approximates it to a high degree in some cases. But there's still no indication this isn't just a general purpose ML algorithm or has anything approaching human intelligence or sentience - basically it can mimic various human skills related to generative intelligence (writing and drawing) but less clear it can mimic anything else.

> This is where I disagree. Unlike a traditional program, both humans and LLMs can take unstructured input and instruction

That is true but it's a huge claim and leap to then say that anything taking unstructured input and instruction is demonstrating intelligence, especially when it fails to execute the requested instructions correctly regardless how much correction you have to do (as demonstrated by the wine glass problem & many other similar kinds of failure points).

There's reason to believe that there's a difference from a power perspective & from the fact that transformers are not self-learning from additional input whereas humans meld short term and long term learning while things like ChatGPT bolt on "memories" which are just factoids stored in a RAG and not something that the transformer is learning as new data.

↑