(github.com)

3 points anuarsh | 1 comments | 28 Aug 25 23:19 UTC | HN request time: 0.204s | source

Show context

attogram ◴[29 Aug 25 00:43 UTC] No.45058682[source]▶

"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!

replies(1): >>45058870 #

anuarsh ◴[29 Aug 25 01:09 UTC] No.45058870[source]▶

Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.

replies(1): >>45061478 #

1. attogram ◴[29 Aug 25 08:09 UTC] No.45061478[source]▶

It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout

Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs