←back to thread

336 points mooreds | 1 comments | | HN request time: 0.216s | source
Show context
vessenes ◴[] No.44484424[source]
Good take from Dwarkesh. And I love hearing his updates on where he’s at. In brief - we need some sort of adaptive learning; he doesn’t see signs of it.

My guess is that frontier labs think that long context is going to solve this: if you had a quality 10mm token context that would be enough to freeze an agent at a great internal state and still do a lot.

Right now the long context models have highly variable quality across their windows.

But to reframe: will we have 10mm token useful context windows in 2 years? That seems very possible.

replies(4): >>44484512 #>>44485388 #>>44486146 #>>44487909 #
nicoburns ◴[] No.44485388[source]
How long is "long"? Real humans have context windows measured in decades of realtime multimodal input.
replies(2): >>44487895 #>>44489678 #
1. vessenes ◴[] No.44489678[source]
I think there’s a good clue here to what may work for frontier models — you definitely do not remember everything about a random day 15 years ago. By the same token, you almost certainly remember some things about a day much longer ago than that, if something significant happened. So, you have some compression / lossy memory working that lets you not just be a tabula rasa about anything older than [your brain’s memory capacity].

Some architectures try to model this infinite, but lossy, horizon with functions that are amenable as a pass on the input context. So far none of them seem to beat the good old attention head, though.