/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs
(github.com)
3 points
anuarsh
| 1 comments |
28 Aug 25 23:19 UTC
|
HN request time: 0.204s
|
source
Show context
attogram
◴[
29 Aug 25 00:43 UTC
]
No.
45058682
[source]
▶
>>45058121 (OP)
#
"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!
replies(1):
>>45058870
#
anuarsh
◴[
29 Aug 25 01:09 UTC
]
No.
45058870
[source]
▶
>>45058682
#
Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.
replies(1):
>>45061478
#
1.
attogram
◴[
29 Aug 25 08:09 UTC
]
No.
45061478
[source]
▶
>>45058870
#
It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout
ID:
GO
↑