Ollama's new engine for multimodal models

1. tommica ◴[16 May 25 05:10 UTC] No.44002018[source]▶

Sidetangent: why is ollama frowned upon by some people? I've never really got any other explanation than "you should run llama.CPP yourself"

replies(10): >>44002029 #>>44002150 #>>44002166 #>>44002486 #>>44002513 #>>44002621 #>>44004218 #>>44005337 #>>44006200 #>>44012844 #

2. nicman23 ◴[16 May 25 05:13 UTC] No.44002029[source]▶

>>44002018 (TP) #

cpp was just faster and with more features that is all

replies(1): >>44002169 #

3. lhl ◴[16 May 25 05:40 UTC] No.44002150[source]▶

>>44002018 (TP) #

Here's some discussion here: https://www.reddit.com/r/LocalLLaMA/comments/1jzocoo/finally...

Ollama appears to not properly credit llama.cpp: https://github.com/ollama/ollama/issues/3185 - this is a long-standing issue that hasn't been addressed.

This seems to have leaked into other projects where even when llama.cpp is being used directly, it's being credited to Ollama: https://github.com/ggml-org/llama.cpp/pull/12896

Ollama doesn't contributed to upstream (that's fine, they're not obligated to), but it's a bit weird that one of the devs claimed to have and uh, not really: https://www.reddit.com/r/LocalLLaMA/comments/1k4m3az/here_is... - that being said they seem to maintain their own fork so anyone could cherry pick stuff it they wanted to: https://github.com/ollama/ollama/commits/main/llama/llama.cp...

replies(1): >>44004513 #

4. gavmor ◴[16 May 25 05:45 UTC] No.44002166[source]▶

>>44002018 (TP) #

Here's a recent thread on Ollama hate from r/localLLaMa: https://www.reddit.com/r/LocalLLaMA/comments/1kg20mu/so_why_...

replies(1): >>44006942 #

5. cwillu ◴[16 May 25 05:45 UTC] No.44002169[source]▶

>>44002029 #

cpp is the thing doing all the heavy lifting, ollama is just a library wrapper.

It'd be like if handbrake tried to pretend that they implemented all the video processing work, when it's dependent on libffmpeg for all of that.

replies(1): >>44004229 #

6. speedgoose ◴[16 May 25 06:52 UTC] No.44002486[source]▶

>>44002018 (TP) #

To me, Ollama is a bit the Docker of LLMs. The user experience is inspired and the model file syntax is also inspired by the Dockerfile syntax. [0]

In the early days of Docker, we had the debate of Docker vs LXC. At the time, Docker was mostly a wrapper over LXC and people were dismissing the great user experience improvements of Docker.

I agree however that the lack of acknowledgement to llama.cpp for a long time has been problematic. They acknowledge the project now.

[0]: https://github.com/ollama/ollama/blob/main/docs/modelfile.md

7. octocop ◴[16 May 25 06:58 UTC] No.44002513[source]▶

>>44002018 (TP) #

For me it's because ollama is just a front-end for llama.cpp, but the ollama folks rarely acknowledge that.

8. buyucu ◴[16 May 25 07:20 UTC] No.44002621[source]▶

>>44002018 (TP) #

I abandoned Ollama because Ollama does not support Vulkan: https://news.ycombinator.com/item?id=42886680

You have to support Vulkan if you care about consumer hardware. Ollama devs clearly don't.

replies(1): >>44003156 #

9. buyucu ◴[16 May 25 10:37 UTC] No.44003743{3}[source]▶

>>44003156 #

why would I use a software that doesn't have the features I want, when a far better alternative like llama.cpp exists? ollama does not add any value.

replies(1): >>44003903 #

10. magicalhippo ◴[16 May 25 11:07 UTC] No.44003903{4}[source]▶

>>44003743 #

I more often than not add multiple models to my WebUI chats to compare and contrast models.

Ollama makes this trivial compared to llama.cpp, and so for me adds a lot of value due to this.

replies(1): >>44005537 #

11. diggan ◴[16 May 25 11:47 UTC] No.44004218[source]▶

>>44002018 (TP) #

Besides the "culture"/licensing/FOSS issue already mentioned, I just wanted to be able to reuse model weights across various applications, but Ollama decided to ship their own way of storing things on disk + with their own registry. I'm guessing it's because they want to eventually be able to monetize this somehow, maybe "private" weights hosted on their registry or something. I don't get why they thought splitting up files into "blobs" made sense for LLM weights, seems they wanted to reduce duplication (ala Docker) but instead it just makes things more complicated for no gains.

End result for users like me though, is to have to duplicate +30GB large files just because I wanted to use the weights in Ollama and the rest of the ecosystem. So instead I use everything else that largely just works the same way, and not Ollama.

replies(1): >>44004528 #

12. diggan ◴[16 May 25 11:48 UTC] No.44004229{3}[source]▶

>>44002169 #

> ollama is just a library wrapper.

Was.

This submission is literally about them moving away from being just a wrapper around llama.cpp :)

replies(1): >>44005522 #

13. tommica ◴[16 May 25 12:16 UTC] No.44004513[source]▶

>>44002150 #

Thanks for the good explanation!

14. tommica ◴[16 May 25 12:17 UTC] No.44004528[source]▶

>>44004218 #

That is an interesting perspective, did not know about that at all!

15. bearjaws ◴[16 May 25 13:33 UTC] No.44005337[source]▶

>>44002018 (TP) #

Anyone who has been around for 10 years can smell the Embrace, Extend, Extinguish model 100 miles away.

They are plainly going to capture the market, and switch to some "enterprise license" that lets them charge $, on the backs of other peoples work.

16. buyucu ◴[16 May 25 13:50 UTC] No.44005522{4}[source]▶

>>44004229 #

no they are not. the submission uses ggml, which is llama.cpp

replies(1): >>44006311 #

17. buyucu ◴[16 May 25 13:51 UTC] No.44005537{5}[source]▶

>>44003903 #

llama-swap does it better than ollama I think.

18. wirybeige ◴[16 May 25 14:51 UTC] No.44006200[source]▶

>>44002018 (TP) #

They refuse to work with the community. There's also the open question of how they are going to monetize, given that they are a VC-backed company.

Why shouldn't I go with llama.cpp, lmstudio, or ramalama (containers/RH); I will at least know what I am getting with each one.

Ramalama actually contributes quite a bit back to llama.cpp/whipser.cpp (more projects probably), while delivering a solution that works better for me.

https://github.com/ollama/ollama/pull/9650 https://github.com/ollama/ollama/pull/5059

19. diggan ◴[16 May 25 14:59 UTC] No.44006311{5}[source]▶

>>44005522 #

I think you misunderstand how these pieces fit together. llama.cpp is library that ships with a CLI+some other stuff, ggml is a library and Ollama has "runners" (like an "execution engine"). Previously, Ollama used llama.cpp (which uses ggml) as the only runner. Eventually, Ollama made their own runner (which also uses ggml) for new models (starting with gemma3 maybe?), still using llama.cpp for the rest (last time I checked at least).

ggml != llama.cpp, but llama.cpp and Ollama are both using ggml as a library.

replies(1): >>44007638 #

20. kergonath ◴[16 May 25 15:55 UTC] No.44006942[source]▶

>>44002166 #

r/localLLaMa is very useful, but also very susceptible to groupthink and more or less astroturfed hype trains and mood swings. This drama needs to be taken in context, there is a lot of emotion and not too much reason.

21. cwillu ◴[16 May 25 17:00 UTC] No.44007638{6}[source]▶

>>44006311 #

“The llama.cpp project is the main playground for developing new features for the ggml library” --https://github.com/ggml-org/llama.cpp

“Some of the development is currently happening in the llama.cpp and whisper.cpp repos” --https://github.com/ggml-org/ggml

replies(1): >>44010025 #

22. diggan ◴[16 May 25 21:42 UTC] No.44010025{7}[source]▶

>>44007638 #

Yeah, those both makes sense. ggml was split from llama.cpp once they realized it could be useful elsewhere, so while llama.cpp is the "main playground", it's still used by others (including llama.cpp). Doesn't mean suddenly that llama.cpp is the same as ggml, not sure why you'd believe that.

23. jimjimwii ◴[17 May 25 08:13 UTC] No.44012844[source]▶

>>44002018 (TP) #

For me it's the R1 fiasco and their dishonesty. How anyone can continue to trust a project that brazenly mislead their users to such an extent just to cash in on the hype is beyond me.