←back to thread

566 points PaulHoule | 1 comments | | HN request time: 0.209s | source
Show context
Alifatisk ◴[] No.44494543[source]
Love the ui in the playground, it reminds me of Qwen chat.

We have reached a point where the bottlenecks in genAI is not the knowledge or accuracy, it is the context window and speed.

Luckily, Google (and Meta?) has pushed the limits of the context window to about 1 million tokens which is incredible. But I feel like todays options are still stuck about ~128k token window per chat, and after that it starts to forget.

Another issue is the time time it takes for inference AND reasoning. dLLMs is an interesting approach at this. I know we have Groqs hardware aswell.

I do wonder, can this be combined with Groqs hardware? Would the response be instant then?

How many tokens can each chat handle in the playground? I couldn't find so much info about it.

Which model is it using for inference?

Also, is the training the same on dLLMs as on the standardised autoregressive LLMs? Or is the weights and models completely different?

replies(4): >>44495048 #>>44495371 #>>44496876 #>>44497661 #
1. martinald ◴[] No.44495048[source]
I agree entirely with you. While Claude Code is amazing, it is also slow as hell and the context issue keeps coming up (usually at what feels like the worst possible time for me).

It honestly feels like dialup most LLMs (apart from this!).

AFIAK with traditional models context size is very memory intensive (though I know there are a lot of things that are trying to 'optimize' this). I believe memory usage grows at the square of context length, so even 10xing context length requires 100x the memory.

(Image) diffusion does not grow like that, it is much more linear. But I have no idea (yet!) about text diffusion models if someone wants to chip in :).