/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs
(github.com)
3 points
anuarsh
| 2 comments |
28 Aug 25 23:19 UTC
|
HN request time: 0.001s
|
source
1.
Haeuserschlucht
◴[
29 Aug 25 06:32 UTC
]
No.
45060882
[source]
▶
>>45058121 (OP)
#
20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.
replies(1):
>>45067676
#
ID:
GO
2.
anuarsh
◴[
29 Aug 25 18:24 UTC
]
No.
45067676
[source]
▶
>>45060882 (TP)
#
We are talking about 100k context here. 20k would be much faster, but you won't need KVCache offloading for it
↑