←back to thread

602 points emrah | 2 comments | | HN request time: 0.413s | source
Show context
emrah ◴[] No.43743338[source]
Available on ollama: https://ollama.com/library/gemma3
replies(2): >>43743657 #>>43743658 #
Der_Einzige ◴[] No.43743658[source]
How many times do I have to say this? Ollama, llamacpp, and many other projects are slower than vLLM/sglang. vLLM is a much superior inference engine and is fully supported by the only LLM frontends that matter (sillytavern).

The community getting obsessed with Ollama has done huge damage to the field, as it's ineffecient compared to vLLM. Many people can get far more tok/s than they think they could if only they knew the right tools.

replies(9): >>43743672 #>>43743695 #>>43743760 #>>43743819 #>>43743824 #>>43743859 #>>43743860 #>>43749101 #>>43753155 #
oezi ◴[] No.43743819[source]
Why is sillytavern the only LLM frontend which matters?
replies(2): >>43744110 #>>43744916 #
1. GordonS ◴[] No.43744110[source]
I tried sillytavern a few weeks ago... wow, that is an "interesting" UI! I blundered around for a while, couldn't figure out how to do anything useful... and then installed LM Studio instead.
replies(1): >>43744434 #
2. imtringued ◴[] No.43744434[source]
I personally thought the lorebook feature was quite neat and then quickly gave up on it because I couldn't get it to trigger, ever.

Whatever those keyword things are, they certainly don't seem to be doing any form of RAG.