Most active commenters

refulgentis(7)
buyucu(6)
(5)
magicalhippo(5)
diggan(4)
tommica(3)
bradly(3)

Popular/hot comments

>>44002018 #
>>44002109 #
>>44007313 #

Ollama's new engine for multimodal models

(ollama.com)

1. ◴[16 May 25 03:18 UTC] No.44001528[source]▶

>>44001087 (OP) #

2. ics ◴[16 May 25 03:43 UTC] No.44001651[source]▶

>>44001087 (OP) #

I'll have to try this later but appreciate that the article gets straight to the point with practical examples and then the details.

3. newusertoday ◴[16 May 25 04:19 UTC] No.44001807[source]▶

>>44001087 (OP) #

why does ollama engine has to change to support new models? every time a new model comes ollama has to be upgraded.

replies(1): >>44001834 #

4. nkwaml ◴[16 May 25 04:28 UTC] No.44001834[source]▶

>>44001807 #

Because of things like this: https://github.com/ggml-org/llama.cpp/issues/12637

Where "supporting" a model doesn't mean what you think it means for cpp

Between that and the long saga with vision models having only partial support, with a CLI tool, and no llama-server support (they only fixed all that very recently) the fact of the matter is that ollama is moving faster and implementing what people want before lama.cpp now

And it will finally shut down all the people who kept copy pasting the same criticism of ollama "it's just a llama.cpp wrapper why are you not using cpp instead"

replies(2): >>44001901 #>>44002040 #

5. simonw ◴[16 May 25 04:38 UTC] No.44001886[source]▶

>>44001087 (OP) #

The timing on this is a little surprising given llama.cpp just finally got a (hopefully) stable vision feature merged into main: https://simonwillison.net/2025/May/10/llama-cpp-vision/

Presumably Ollama had been working on this for quite a while already - it sounds like they've broken their initial dependency on llama.cpp. Being in charge of their own destiny makes a lot of sense.

replies(1): >>44001924 #

6. Maxious ◴[16 May 25 04:42 UTC] No.44001901{3}[source]▶

>>44001834 #

There's also some interpersonal conflict in llama.cpp that's hampering other bug fixes https://github.com/ikawrakow/ik_llama.cpp/pull/400

replies(1): >>44007006 #

7. lolinder ◴[16 May 25 04:48 UTC] No.44001924[source]▶

>>44001886 #

Do you know what exactly the difference is with either of these projects adding multimodal support? Both have supported LLaVA for a long time. Did that require special casing that is no longer required?

I'd hoped to see this mentioned in TFA, but it kind of acts like multimodal is totally new to Ollama, which it isn't.

replies(2): >>44001952 #>>44002109 #

8. simonw ◴[16 May 25 04:54 UTC] No.44001952{3}[source]▶

>>44001924 #

There's a pretty clear explanation of the llama.cpp history here: https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd...

I don't fully understand Ollama's timeline and strategy yet.

9. tommica ◴[16 May 25 05:10 UTC] No.44002018[source]▶

>>44001087 (OP) #

Sidetangent: why is ollama frowned upon by some people? I've never really got any other explanation than "you should run llama.CPP yourself"

replies(10): >>44002029 #>>44002150 #>>44002166 #>>44002486 #>>44002513 #>>44002621 #>>44004218 #>>44005337 #>>44006200 #>>44012844 #

10. nicman23 ◴[16 May 25 05:13 UTC] No.44002029[source]▶

>>44002018 #

cpp was just faster and with more features that is all

replies(1): >>44002169 #

11. w8nC ◴[16 May 25 05:16 UTC] No.44002040{3}[source]▶

>>44001834 #

Now it’s just a wrapper around hosted APIs.

Went with my own wrapper around llama.cpp and stable-diffusion.cpp with optional prompting hosted if I don’t like the result so much, but it makes a good start for hosted to improve on.

Also obfuscates any requests sent to hosted, cause why feed them insight to my use case when I just want to double check algorithmic choices of local AI? The ground truth relationship func names and variable names imply is my little secret

replies(1): >>44002065 #

12. Patrick_Devine ◴[16 May 25 05:21 UTC] No.44002065{4}[source]▶

>>44002040 #

Wait, what hosted APIs is Ollama wrapping?

13. refulgentis ◴[16 May 25 05:31 UTC] No.44002109{3}[source]▶

>>44001924 #

It's a turducken of crap from everyone but ngxson and Hugging Face and llama.cpp in this situation.

llama.cpp did have multimodal, I've been maintaining an integration for many moons now. (Feb 2024? Original LLaVa through Gemma 3)

However, this was not for mere mortals. It was not documented and had gotten unwieldy, to say the least.

ngxson (HF employee) did a ton of work to get gemma3 support in, and had to do it in a separate binary. They dove in and landed a refactored backbone that is presumably more maintainable and on track to be in what I think of as the real Ollama, llama.cpp's server binary.

As you well note, Ollama is Ollamaing - I joked, once, that the median llama.cpp contribution from Ollama is a driveby GitHub comment asking when a feature will land in llama-server, so it can be copy-pasted into Ollama.

It's really sort of depressing to me because I'm just one dude, it really wasn't that hard to support it (it's one of a gajillion things I have to do, I'd estimate 2 SWE-weeks at 10 YOE, 1.5 SWE-days for every model release), and it's hard to get attention for detailed work in this space with how much everyone exaggerates and rushes to PR.

EDIT: Coming back after reading the blog post, and I'm 10x as frustrated. "Support thinking / reasoning; Tool calling with streaming responses" --- this is table stakes stuff that was possible eons ago.

I don't see any sign of them doing anything specific in any of the code they link, the whole thing reads like someone carefully worked with an LLM to present a maximalist technical-sounding version of the llama.cpp stuff and frame it as if they worked with these companies and built their own thing. (note the very careful wording on this, e.g. in the footer the companies are thanked for releasing the models)

I think it's great that they have a nice UX that helps people run llama.cpp locally without compiling, but it's hard for me to think of a project I've been more by turned off by in my 37 years on this rock.

replies(3): >>44002251 #>>44002410 #>>44002628 #

14. lhl ◴[16 May 25 05:40 UTC] No.44002150[source]▶

>>44002018 #

Here's some discussion here: https://www.reddit.com/r/LocalLLaMA/comments/1jzocoo/finally...

Ollama appears to not properly credit llama.cpp: https://github.com/ollama/ollama/issues/3185 - this is a long-standing issue that hasn't been addressed.

This seems to have leaked into other projects where even when llama.cpp is being used directly, it's being credited to Ollama: https://github.com/ggml-org/llama.cpp/pull/12896

Ollama doesn't contributed to upstream (that's fine, they're not obligated to), but it's a bit weird that one of the devs claimed to have and uh, not really: https://www.reddit.com/r/LocalLLaMA/comments/1k4m3az/here_is... - that being said they seem to maintain their own fork so anyone could cherry pick stuff it they wanted to: https://github.com/ollama/ollama/commits/main/llama/llama.cp...

replies(1): >>44004513 #

15. gavmor ◴[16 May 25 05:45 UTC] No.44002166[source]▶

>>44002018 #

Here's a recent thread on Ollama hate from r/localLLaMa: https://www.reddit.com/r/LocalLLaMA/comments/1kg20mu/so_why_...

replies(1): >>44006942 #

16. cwillu ◴[16 May 25 05:45 UTC] No.44002169{3}[source]▶

>>44002029 #

cpp is the thing doing all the heavy lifting, ollama is just a library wrapper.

It'd be like if handbrake tried to pretend that they implemented all the video processing work, when it's dependent on libffmpeg for all of that.

replies(1): >>44004229 #

17. rvz ◴[16 May 25 06:04 UTC] No.44002251{4}[source]▶

>>44002109 #

> As you well note, Ollama is Ollamaing - I joked, once, that their median llama.cpp contribution from Ollama is asking when a feature will land in llama-server so it can be copy-pasted into Ollama.

Other than being a nice wrapper around llama.cpp, are there any meaningful improvements that they came up with that landed in llama.cpp?

I guess in this case with the introduction of libmtmd (for multi-modal support in llama.cpp) Ollama waited and did a git pull and now multi-modal + better vision support was here and no proper credit was given.

Yes, they had vision support via LLaVa models but it wasn't that great.

replies(1): >>44002339 #

18. refulgentis ◴[16 May 25 06:24 UTC] No.44002339{5}[source]▶

>>44002251 #

There's been no noteworthy contributions, I'd honestly wouldn't be surprised to hear there's 0 contributions.

Well it's even sillier than that: I didn't realize that the timeline in the llama.cpp link was humble and matched my memory: it was the test binaries that changed. i.e. the API was refactored a bit and such but its not anything new under the sun. Also the llama.cpp they have has tool and thinking support. shrugs

The tooling was called llava but that's just because it was the first model -- multimodal models are/were consistently supported ~instantly, it was just your calls into llama.cpp needed to manage that,a nd they still do! - its just there's been some cleanup so there isn't one test binary for every model.

It's sillier than that in it wasn't even "multi-modal + better vision support was here" it was "oh we should do that fr if llama.cpp is"

On a more positive note, the big contributor I appreciate in that vein is Kobold contributed a ton of Vulkan work IIUC.

And another round of applause for ochafik: idk if this gentleman from Google is doing this in his spare time or fulltime for Google, but they have done an absolutely stunning amount of work to make tool calls and thinking systematically approachable, even building a header-only Jinja parser implementation and designing a way to systematize "blessed" overrides of the rushed silly templates that are inserted into models. Really important work IMHO, tool calls are what make AI automated and having open source being able to step up here significantly means you can have legit Sonnet-like agency in Gemma 3 12B, even Phi 4 3.8B to an extent.

replies(1): >>44039779 #

19. Patrick_Devine ◴[16 May 25 06:40 UTC] No.44002410{4}[source]▶

>>44002109 #

I worked on the text portion of gemma3 (as well as gemma2) for the Ollama engine, and worked directly with the Gemma team at Google on the implementation. I didn't base the implementation off of the llama.cpp implementation which was done in parallel. We did our implementation in golang, and llama.cpp did theirs in C++. There was no "copy-and-pasting" as you are implying, although I do think collaborating together on these new models would help us get them out the door faster. I am really appreciative of Georgi catching a few things we got wrong in our implementation.

20. speedgoose ◴[16 May 25 06:52 UTC] No.44002486[source]▶

>>44002018 #

To me, Ollama is a bit the Docker of LLMs. The user experience is inspired and the model file syntax is also inspired by the Dockerfile syntax. [0]

In the early days of Docker, we had the debate of Docker vs LXC. At the time, Docker was mostly a wrapper over LXC and people were dismissing the great user experience improvements of Docker.

I agree however that the lack of acknowledgement to llama.cpp for a long time has been problematic. They acknowledge the project now.

[0]: https://github.com/ollama/ollama/blob/main/docs/modelfile.md

21. octocop ◴[16 May 25 06:58 UTC] No.44002513[source]▶

>>44002018 #

For me it's because ollama is just a front-end for llama.cpp, but the ollama folks rarely acknowledge that.

22. buyucu ◴[16 May 25 07:20 UTC] No.44002621[source]▶

>>44002018 #

I abandoned Ollama because Ollama does not support Vulkan: https://news.ycombinator.com/item?id=42886680

You have to support Vulkan if you care about consumer hardware. Ollama devs clearly don't.

replies(1): >>44003156 #

23. nolist_policy ◴[16 May 25 07:21 UTC] No.44002628{4}[source]▶

>>44002109 #

For one Ollama supports interleaved sliding window attention for Gemma 3 while llama.cpp doesn't.[0] iSWA reduces kv cache size to 1/6.

Ollama is written in golang so of course they can not meaningfully contribute that back to llama.cpp.

[0] https://github.com/ggml-org/llama.cpp/issues/12637

replies(2): >>44002699 #>>44008326 #

24. noodletheworld ◴[16 May 25 07:33 UTC] No.44002699{5}[source]▶

>>44002628 #

What nonsense is this?

Where do you imagine ggml is from?

> The llama.cpp project is the main playground for developing new features for the ggml library

-> https://github.com/ollama/ollama/tree/27da2cddc514208f4e2353...

(Hint: If you think they only write go in ollama, look at the commit history of that folder)

replies(2): >>44002827 #>>44003146 #

25. nolist_policy ◴[16 May 25 07:53 UTC] No.44002827{6}[source]▶

>>44002699 #

llama.cpp clearly does not support iSWA: https://github.com/ggml-org/llama.cpp/issues/12637

Ollama does, please try it.

26. imtringued ◴[16 May 25 08:49 UTC] No.44003146{6}[source]▶

>>44002699 #

Dude, they literally announced that they stopped using llama.cpp and are now using ggml directly. Whatever gotcha you think there is, exists only in your head.

replies(1): >>44006115 #

27. hexmiles ◴[16 May 25 10:26 UTC] No.44003680[source]▶

>>44002611 #

If I understood it correctly: this time no, it is actually new engine builded by the ollama team indipendent from llama.cpp

replies(2): >>44003746 #>>44003749 #

28. buyucu ◴[16 May 25 10:37 UTC] No.44003743{4}[source]▶

>>44003156 #

why would I use a software that doesn't have the features I want, when a far better alternative like llama.cpp exists? ollama does not add any value.

replies(1): >>44003903 #

29. Havoc ◴[16 May 25 10:37 UTC] No.44003746{3}[source]▶

>>44003680 #

llama.cpp added support for vision 6 days ago.

See SimonW post here:

https://simonwillison.net/2025/May/10/llama-cpp-vision/

>If I understood it correctly

You understood it exactly like they wanted you to...

30. buyucu ◴[16 May 25 10:37 UTC] No.44003749{3}[source]▶

>>44003680 #

I doubt it. Llama.cpp just added support for the same models a few weeks ago. Folks at ollama just did a git pull.

replies(2): >>44004061 #>>44004120 #

31. magicalhippo ◴[16 May 25 11:07 UTC] No.44003903{5}[source]▶

>>44003743 #

I more often than not add multiple models to my WebUI chats to compare and contrast models.

Ollama makes this trivial compared to llama.cpp, and so for me adds a lot of value due to this.

replies(1): >>44005537 #

32. oezi ◴[16 May 25 11:10 UTC] No.44003925[source]▶

>>44001087 (OP) #

I wish multimodal would imply text, image and audio (+potentially video). If a model supports only image generation or image analysis, vision model seems the more appropriate term.

We should aim to distinguish multimodal modals such as Qwen2.5-Omni from Qwen2.5-VL.

In this sense: Ollama's new engine adds vision support.

replies(2): >>44006219 #>>44007313 #

33. magicalhippo ◴[16 May 25 11:28 UTC] No.44004061{4}[source]▶

>>44003749 #

It's open source, you could have checked. Seems indeed like the new engine cuts out llama.cpp, using GGML libary directly.

https://github.com/ollama/ollama/pull/7913

replies(1): >>44005517 #

34. ◴[16 May 25 11:35 UTC] No.44004120{4}[source]▶

>>44003749 #

35. diggan ◴[16 May 25 11:47 UTC] No.44004218[source]▶

>>44002018 #

Besides the "culture"/licensing/FOSS issue already mentioned, I just wanted to be able to reuse model weights across various applications, but Ollama decided to ship their own way of storing things on disk + with their own registry. I'm guessing it's because they want to eventually be able to monetize this somehow, maybe "private" weights hosted on their registry or something. I don't get why they thought splitting up files into "blobs" made sense for LLM weights, seems they wanted to reduce duplication (ala Docker) but instead it just makes things more complicated for no gains.

End result for users like me though, is to have to duplicate +30GB large files just because I wanted to use the weights in Ollama and the rest of the ecosystem. So instead I use everything else that largely just works the same way, and not Ollama.

replies(1): >>44004528 #

36. diggan ◴[16 May 25 11:48 UTC] No.44004229{4}[source]▶

>>44002169 #

> ollama is just a library wrapper.

Was.

This submission is literally about them moving away from being just a wrapper around llama.cpp :)

replies(1): >>44005522 #

37. tommica ◴[16 May 25 12:16 UTC] No.44004513{3}[source]▶

>>44002150 #

Thanks for the good explanation!

38. tommica ◴[16 May 25 12:17 UTC] No.44004528{3}[source]▶

>>44004218 #

That is an interesting perspective, did not know about that at all!

39. Koshima ◴[16 May 25 13:18 UTC] No.44005148[source]▶

>>44001087 (OP) #

The timing makes sense if you consider the broader trend in the LLM space. We're moving from just text to more integrated, multimodal experiences, and having a tightly controlled engine like this could be a game changer for developers building apps that require real-time, context-rich understanding.

40. bearjaws ◴[16 May 25 13:33 UTC] No.44005337[source]▶

>>44002018 #

Anyone who has been around for 10 years can smell the Embrace, Extend, Extinguish model 100 miles away.

They are plainly going to capture the market, and switch to some "enterprise license" that lets them charge $, on the backs of other peoples work.

41. bradly ◴[16 May 25 13:35 UTC] No.44005356[source]▶

>>44001087 (OP) #

The strength with Ollama for me was the ease of being able to run a simple Docker command and be up and running locally without any tinkering, but with image and video Docker is no longer an option as Docker does not use the GPU. I'm curious how Ollama plans to support their Docker integration going forward or if it is a less important part of the project that I'm giving it credit for.

replies(2): >>44005406 #>>44005445 #

42. IanCal ◴[16 May 25 13:39 UTC] No.44005406[source]▶

>>44005356 #

You can use a GPU with docker - at least on some platforms. There's more setup though, nvidia have some details to help https://docs.nvidia.com/datacenter/cloud-native/container-to...

replies(1): >>44005512 #

43. ◴[16 May 25 13:42 UTC] No.44005445[source]▶

>>44005356 #

44. bradly ◴[16 May 25 13:48 UTC] No.44005512{3}[source]▶

>>44005406 #

Thank you. I should have specified on MacOS. I ran into this recently trying to setup stable-diffusion-webui/InvokeAI/Foocus and finding it much more complicated to get working for me on my personal laptop than the llms.

replies(1): >>44006894 #

45. mark_l_watson ◴[16 May 25 13:49 UTC] No.44005513[source]▶

>>44001087 (OP) #

I have mostly used Ollama to run local models for close to a year, love it, but I have barely touched Llava, etc. multi modal support because all my personal use cases are text based.

Question: what are cool and useful multi modal projects have people here built using local models?

I am looking for personal project ideas.

46. buyucu ◴[16 May 25 13:49 UTC] No.44005517{5}[source]▶

>>44004061 #

seriously? who do you think develops ggml?

hint: it's llama.cpp

47. buyucu ◴[16 May 25 13:50 UTC] No.44005522{5}[source]▶

>>44004229 #

no they are not. the submission uses ggml, which is llama.cpp

replies(1): >>44006311 #

48. buyucu ◴[16 May 25 13:51 UTC] No.44005537{6}[source]▶

>>44003903 #

llama-swap does it better than ollama I think.

49. JKCalhoun ◴[16 May 25 14:40 UTC] No.44006071[source]▶

>>44001087 (OP) #

Does Ollama support the "user context" that higher level LLMs like ChatGPT have?

I'm not clear what they are called (or how implemented) — but perhaps 1) the initial prompt/context (that, for example, Grok has got in trouble with recently) and 2) the kind of saved context that allows ChatGPT to know things about your prompt-history so it can better answer future queries.

(My use of ollama has been pretty bare-bones and I have not seen anything covering these higher level features in -help.)

replies(1): >>44006140 #

50. noodletheworld ◴[16 May 25 14:43 UTC] No.44006115{7}[source]▶

>>44003146 #

I'm responding to this assertion:

> Ollama is written in golang so of course they can not meaningfully contribute that back to llama.cpp.

llama.cpp consumes GGML.

ollama consumes GGML.

If they contribute upstream changes, they are contributing to llama.cpp.

The assertions that they:

a) only write golang

b) cannot upstream changes

Are both, categorically, false.

You can argue what 'meaningfully' means if you like. You can also believe whatever you like.

However, both (a) and (b), are false. It is not a matter of dispute.

> Whatever gotcha you think there is, exists only in your head.

There is no 'gotcha'. You're projecting. My only point is that any claim that they are somehow not able to contribute upstream changes only indicates a lack of desire or competence, not a lack of the technical capacity to do so.

replies(1): >>44008349 #

51. lxgr ◴[16 May 25 14:45 UTC] No.44006140[source]▶

>>44006071 #

My understanding is that ollama is more of an "LLM backend", i.e. it provides a server process on your machine that answers requests relatively statelessly.

I believe it keeps the model loaded across sessions, and possibly keeps the KV cache warm for ongoing sessions (but I doubt it, based on the API shape; I don't see a "session" parameter), but that's about it. Nothing seems to be written to disk.

Features like ChatGPT's "memories" or cross-chat context require a persistence layer that's probably best suited for a "frontend". Ollama's API does support passing in requests with history, for example: https://github.com/ollama/ollama/blob/main/docs/api.md#chat-...

replies(1): >>44007064 #

52. wirybeige ◴[16 May 25 14:51 UTC] No.44006200[source]▶

>>44002018 #

They refuse to work with the community. There's also the open question of how they are going to monetize, given that they are a VC-backed company.

Why shouldn't I go with llama.cpp, lmstudio, or ramalama (containers/RH); I will at least know what I am getting with each one.

Ramalama actually contributes quite a bit back to llama.cpp/whipser.cpp (more projects probably), while delivering a solution that works better for me.

https://github.com/ollama/ollama/pull/9650 https://github.com/ollama/ollama/pull/5059

53. ◴[16 May 25 14:53 UTC] No.44006219[source]▶

>>44003925 #

54. diggan ◴[16 May 25 14:59 UTC] No.44006311{6}[source]▶

>>44005522 #

I think you misunderstand how these pieces fit together. llama.cpp is library that ships with a CLI+some other stuff, ggml is a library and Ollama has "runners" (like an "execution engine"). Previously, Ollama used llama.cpp (which uses ggml) as the only runner. Eventually, Ollama made their own runner (which also uses ggml) for new models (starting with gemma3 maybe?), still using llama.cpp for the rest (last time I checked at least).

ggml != llama.cpp, but llama.cpp and Ollama are both using ggml as a library.

replies(1): >>44007638 #

55. washadjeffmad ◴[16 May 25 15:52 UTC] No.44006894{4}[source]▶

>>44005512 #

Out of curiosity, before you attempted this, what was your impression of the fitness and performance of Macs for generative AI?

replies(1): >>44007364 #

56. kergonath ◴[16 May 25 15:55 UTC] No.44006942{3}[source]▶

>>44002166 #

r/localLLaMa is very useful, but also very susceptible to groupthink and more or less astroturfed hype trains and mood swings. This drama needs to be taken in context, there is a lot of emotion and not too much reason.

57. kergonath ◴[16 May 25 16:00 UTC] No.44007006{4}[source]▶

>>44001901 #

What the hell is going on there? It’s utterly bizarre to see devs discussing granting each other licences to work on the same code for an open source project. How on earth did they end up there?

replies(2): >>44008119 #>>44008366 #

58. codybontecou ◴[16 May 25 16:04 UTC] No.44007064{3}[source]▶

>>44006140 #

Is there more to memory than just an entry into the context/messages array passed to the LLM?

replies(1): >>44007377 #

59. prettyblocks ◴[16 May 25 16:28 UTC] No.44007313[source]▶

>>44003925 #

I'm very interested in working with video inputs, is it possible to do that with Qwen2.5-Omni and Ollama?

replies(3): >>44008675 #>>44009579 #>>44011015 #

60. bradly ◴[16 May 25 16:33 UTC] No.44007364{5}[source]▶

>>44006894 #

Before I attempted, I had no idea. I hadn't ran any AI models locally and I don't follow this stuff too closely, so I wasn't even sure if I could get something usable on my M1 MacBook Air. I went in fairly blind which is why the Ollama Docker installer was so appealing to me–I got to hold off fighting Python and Homebrew until I had a better sense of what the tool could provide.

After my attempt, I think chat is performant enough on my M1. Code gen was too slow for me. Image generation was 1-2 minutes for small pixel art sprites, which for my use case is fine to let churn for a while, but the image generation results were much worse than ChatGPT browser gives me out of the box. I do not know if poor image quality is due to machine constraints or me not understanding how to configure the checkpoint and models.

I would be interested to hear how an M3 or M4 Mini handles these things as those are fair affordable to pick up used.

61. lxgr ◴[16 May 25 16:34 UTC] No.44007377{4}[source]▶

>>44007064 #

There must be some heavy compression/filtering going on, as there's no chance GPT can hold everybody's entire ChatGPT conversation history in its context.

But practically, I believe that Ollama just doesn't have a concept of server-side persistent state at the moment to even do such a thing.

replies(1): >>44007576 #

62. codybontecou ◴[16 May 25 16:54 UTC] No.44007576{5}[source]▶

>>44007377 #

I _think_ the compression used is literally “Chat, compress this array of messages”. This is the technique used in Claude Plays Pokemon.

I’m sure there’s more to the prompt and what to do with this newly generated messages array, but the gist is there.

If this is the case, an Ollama implementation shouldn’t be too difficult.

63. yossi_peti ◴[16 May 25 16:55 UTC] No.44007582[source]▶

>>44001087 (OP) #

Their example "understanding and translating vertical Chinese spring couplets to English" has a lot of mistakes in it. I'm guessing the person writing the blog post to show off that example doesn't actually know Chinese.

What is actually written: Top: 家和国盛 Left: 和谐生活人人舒畅迎新春 Right: 平安社会家家欢乐辞旧岁

What Ollama saw: Top: 盛和家国 (correct characters but wrong order) Left: It reads "新春" (new spring) as 舒畅 (comfortable) Right: 家家欢欢乐乐辞旧岁 (duplicates characters and omits the first four)

replies(1): >>44007796 #

64. cwillu ◴[16 May 25 17:00 UTC] No.44007638{7}[source]▶

>>44006311 #

“The llama.cpp project is the main playground for developing new features for the ggml library” --https://github.com/ggml-org/llama.cpp

“Some of the development is currently happening in the llama.cpp and whisper.cpp repos” --https://github.com/ggml-org/ggml

replies(1): >>44010025 #

65. mchiang ◴[16 May 25 17:15 UTC] No.44007796[source]▶

>>44007582 #

I'm one of the maintainers who ran that example. I am Chinese.

The English translation, I thought was pretty spot on. We don't hide the mistakes of the models or fake the demos.

Overtime, of course I hope the models to improve much more

66. clpm4j ◴[16 May 25 17:17 UTC] No.44007817[source]▶

>>44001087 (OP) #

The whole '*llama' naming convention in the LLM world is more confusing to me than it probably should be. So many llamas running around out here.

replies(2): >>44007974 #>>44008834 #

67. mcbuilder ◴[16 May 25 17:35 UTC] No.44007974[source]▶

>>44007817 #

Unfortunately the speed of AI/ML is so crazy fast. I don't know a better way to keep track other than paying attention all the time. The field also loves memey names. A few years ago everyone was naming models after Sesame Street characters, there were the YOLO family of models. Conference papers are not immune, in fact they are greatest "offenders".

68. andy_xor_andrew ◴[16 May 25 17:37 UTC] No.44007996[source]▶

>>44001087 (OP) #

They are talking a lot about this new engine - I'd love to see details on how it's actually implemented. Given llama.cpp is a herculean feat, if you are going to claim to have some replacement for it, an example of how you did it would be good!

Based on this part:

> We set out to support a new engine that makes multimodal models first-class citizens, and getting Ollama’s partners to contribute more directly the community - the GGML tensor library.

And from clicking through a github link they had:

https://github.com/ollama/ollama/blob/main/model/models/gemm...

My takeaway is, the GGML library (the thing that is the backbone for llama.cpp) must expose some FFI (foreign function interface) that can be invoked from Go, so in the ollama Go code, they can write their own implementations of model behavior (like Gemma 3) that just calls into the GGML magic. I think I have that right? I would have expected a detail like that to be front and center in the blog post.

replies(1): >>44009766 #

69. Philpax ◴[16 May 25 17:51 UTC] No.44008119{5}[source]▶

>>44007006 #

There seems to be some bad blood between ikawrakow and ggerganov: https://github.com/ikawrakow/ik_llama.cpp/discussions/316

replies(1): >>44011036 #

70. refulgentis ◴[16 May 25 18:13 UTC] No.44008326{5}[source]▶

>>44002628 #

It's impossible to meaningfully contribute to the C library you call from Go because you're calling it from Go? :)

We can see the weakness of this argument given it is unlikely any front-end is written in C, and then noting it is unlikely ~0 people contribute to llama.cpp.

replies(1): >>44009809 #

71. refulgentis ◴[16 May 25 18:15 UTC] No.44008349{8}[source]▶

>>44006115 #

FWIW I don't know why you're being downvoted other than a standard from the bleachers "idk what's going on but this guy seems more negative!" -- cheers -- "a [specious argument that shades rather than illuminates] can travel halfway around the world before..."

72. prophesi ◴[16 May 25 18:17 UTC] No.44008366{5}[source]▶

>>44007006 #

My guess is that there's money involved. Maybe a spat between an ex-employee and their ex-employer?

73. oezi ◴[16 May 25 18:49 UTC] No.44008675{3}[source]▶

>>44007313 #

I have only tested Qwen2.5-Omni for audio and it was hit and miss for my use case of tagging audio.

74. ◴[16 May 25 19:08 UTC] No.44008834[source]▶

>>44007817 #

75. ac29 ◴[16 May 25 19:22 UTC] No.44008952[source]▶

>>44001087 (OP) #

I am amused that one of the handful examples they chose to use is wrong:

"The best way to get to Stanford University from the Ferry Building in San Francisco depends on your preferences and budget. Here are a few options:

1. *By Car*: Take US-101 South to CA-85 South, then continue on CA-101 South."

CA 85 is significantly farther down 101 than Palo Alto.

76. machinelearning ◴[16 May 25 20:35 UTC] No.44009579{3}[source]▶

>>44007313 #

What's a use case are you interested in re: video?

replies(1): >>44011938 #

77. Hugsun ◴[16 May 25 20:56 UTC] No.44009766[source]▶

>>44007996 #

Ollama are known for their lack of transparency, poor attribution and anti-user decisions.

I was surprised to see the amount of attribution in this post. They've been catching quite a bit of flack for this so they might be adjusting.

78. magicalhippo ◴[16 May 25 21:03 UTC] No.44009809{6}[source]▶

>>44008326 #

They can of course meaningfully contribute new C++ code to llama.cpp, which they then could later use downstream in Go.

What they cannot meaningfully do is write Go code that solves their problems and upstream those changes to llama.cpp.

The former requires they are comfortable writing C++, something perhaps not all Go devs are.

replies(1): >>44010943 #

79. diggan ◴[16 May 25 21:42 UTC] No.44010025{8}[source]▶

>>44007638 #

Yeah, those both makes sense. ggml was split from llama.cpp once they realized it could be useful elsewhere, so while llama.cpp is the "main playground", it's still used by others (including llama.cpp). Doesn't mean suddenly that llama.cpp is the same as ggml, not sure why you'd believe that.

80. refulgentis ◴[17 May 25 00:12 UTC] No.44010943{7}[source]▶

>>44009809 #

I'd love to be able to take this into account, step back, and say "Ah yes - there is non-zero probability they are materially incapable of contributing back to their dependency" - in practice, if you're comfortable writing SWA in Go, you're going to be comfortable writing it in C++, and they are writing C++ already.

(it's also worth looking at the code linked for the model-specific impls, this isn't exactly 1000s of lines of complicated code. To wit, while they're working with Georgi...why not offer to help land it in llama.cpp?)

replies(1): >>44011647 #

81. tough ◴[17 May 25 00:24 UTC] No.44011015{3}[source]▶

>>44007313 #

https://huggingface.co/blog/smolvlm

82. tough ◴[17 May 25 00:28 UTC] No.44011036{6}[source]▶

>>44008119 #

But he's talking about a MIT License!

WTF

83. magicalhippo ◴[17 May 25 02:44 UTC] No.44011647{8}[source]▶

>>44010943 #

Perhaps for SWA.

For the multimodal stuff it's a lot clear cut. Ollama used the image processing libraries from Go, while in llama.cpp they ended up rolling their own image processing routines.

replies(1): >>44011701 #

84. refulgentis ◴[17 May 25 02:57 UTC] No.44011701{9}[source]▶

>>44011647 #

Citation?

My groundbreaking implementation passes it RGB bytes, passes em through the image projector, and put the tokens in the prompt.

And I cannot imagine sure why the inference engine would be more concerned with it than that.

Is my implementation a groundbreaking achievement worth rendering llama.cpp a footnote, because I use Dart image-processing libraries?

replies(1): >>44011801 #

85. magicalhippo ◴[17 May 25 03:24 UTC] No.44011801{10}[source]▶

>>44011701 #

> Citation?

https://github.com/ollama/ollama/issues/7300#issuecomment-24...

https://github.com/ggml-org/llama.cpp/blob/3e0be1cacef290c99...

Anyway my point was just that it's not as easy as just pushing a patch upstream, like it is in many other projects. It would require a new or different implementation.

replies(1): >>44011995 #

86. prettyblocks ◴[17 May 25 03:57 UTC] No.44011938{4}[source]▶

>>44009579 #

I'm curious how effective these models would be at recognizing if the input video was ai generated or heavily manipulated. Also various things around face/object segmentation.

87. refulgentis ◴[17 May 25 04:13 UTC] No.44011995{11}[source]▶

>>44011801 #

I see, they can't figure out how to contribute a few lines of C++ because we have a link where someone says they can't figure out how to contribute C++ code only Go. :)

There's a couple things I want to impart: #1) empathy is important. One comment about one feature from maybe an ollama core team member doesn't mean people are rushing to waste their time and look mean calling them out for poor behavior. #2) half formed thought: something of what we might call the devil lives in a common behavior pattern that I have to resist myself: rushing in, with weak arguments, to excuse poor behavior. Sometimes I act as if litigating one instance of it, and finding a rationale for it in that instance, makes their behavior pattern reasonable.

Riffing, an analogy someone else made is particularly adept: ollama is to llama.cpp as handbrake is to ffmpeg. I cut my teeth on C++ via handbrake almost 2 decades ago, and we wouldn't be caught dead acting this way. At the very least for fear of embarrassment. What I didnt anticipate is that people will make contrarian arguments on your behalf no matter what you do.

88. jimjimwii ◴[17 May 25 08:13 UTC] No.44012844[source]▶

>>44002018 #

For me it's the R1 fiasco and their dishonesty. How anyone can continue to trust a project that brazenly mislead their users to such an extent just to cash in on the hype is beyond me.

89. ochafik ◴[20 May 25 10:02 UTC] No.44039779{6}[source]▶

>>44002339 #

Thanks for the kind words!

I've indeed done all that on my spare time (still under Google copyright), very happy to see this used and appreciated :-)

About to start a new job / unsure if I'll be able to contribute more, but it's been a lovely ride! (largely thanks to the other contributors and ggerganov@ himself!)

↑