(github.com)

3 points anuarsh | 2 comments | 28 Aug 25 23:19 UTC | HN request time: 0.001s | source

1. Haeuserschlucht ◴[29 Aug 25 06:32 UTC] No.45060882[source]▶

20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.

replies(1): >>45067676 #

2. anuarsh ◴[29 Aug 25 18:24 UTC] No.45067676[source]▶

>>45060882 (TP) #

We are talking about 100k context here. 20k would be much faster, but you won't need KVCache offloading for it

↑

Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs