Extending the context length to 1M tokens

(qwenlm.github.io)

116 points cmcconomy | 2 comments | 18 Nov 24 16:27 UTC | HN request time: 0.435s | source

Show context

anon291 ◴[18 Nov 24 17:49 UTC] No.42174879[source]▶

Can we all agree that these models far surpass human intelligence now? I mean they process hours worth of audio in less time than it would take a human to even listen. I think the singularity passed and we didn't even notice (which would be expected)

replies(11): >>42174949 #>>42174987 #>>42175002 #>>42175008 #>>42175019 #>>42175095 #>>42175118 #>>42175171 #>>42175223 #>>42175324 #>>42176838 #

Workaccount2 ◴[18 Nov 24 18:00 UTC] No.42175019[source]▶

>>42174879 #

They process the audio but they stumble enough with recall that you cannot really trust it.

I had a problem where I used GPT-4o to help me with inventory management, something a 5th grade kid could handle, and it kept screwing up values for a list of ~50 components. I ended up spending more time trying to get it to properly parse the input audio (I read off the counts as I moved through inventory bins) then if I had just done it manually.

On the other hand, I have had good success with having it write simple programs and apps. So YMMV quite a lot more than with a regular person.

replies(3): >>42175047 #>>42175104 #>>42175157 #

XenophileJKO ◴[18 Nov 24 18:13 UTC] No.42175157[source]▶

>>42175019 #

Likely the issue is how you are asking the model to process things. The primary limitation is the amount of information (or really attention) they can keep in flight at any given moment.

This generally means for a task like you are doing, you need to have sign posts in the data like minute markers or something that it can process serially.

This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

replies(1): >>42175959 #

anon291 ◴[18 Nov 24 19:27 UTC] No.42175959[source]▶

>>42175157 #

> This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.

Ranking / sorting is O(n log n) no matter what. Given that a transformer runs in constant time before we 'force' it to output an answer, there must be an M such that beyond that length it cannot reliably sort a list. This MUST be the case and can only be solved by running the model some indeterminate number of times, but I don't believe we currently have any architecture to do that.

Note that humans have the same limitation. If you give humans a time limit, there is a maximum number of things they will be able to sort reliably in that time.

replies(1): >>42177723 #

1. christianqchung ◴[18 Nov 24 22:17 UTC] No.42177723[source]▶

>>42175959 #

Transformers absolutely do not run in constant time by any reasonable definition, no matter what your point is.

replies(1): >>42177845 #

2. anon291 ◴[18 Nov 24 22:35 UTC] No.42177845[source]▶

>>42177723 (TP) #

They absolutely do given a sequence size. All models have max context lengths. Thus bounded by a constant

↑