←back to thread

344 points LorenDB | 1 comments | | HN request time: 0.203s | source
Show context
simonw ◴[] No.44001886[source]
The timing on this is a little surprising given llama.cpp just finally got a (hopefully) stable vision feature merged into main: https://simonwillison.net/2025/May/10/llama-cpp-vision/

Presumably Ollama had been working on this for quite a while already - it sounds like they've broken their initial dependency on llama.cpp. Being in charge of their own destiny makes a lot of sense.

replies(1): >>44001924 #
lolinder ◴[] No.44001924[source]
Do you know what exactly the difference is with either of these projects adding multimodal support? Both have supported LLaVA for a long time. Did that require special casing that is no longer required?

I'd hoped to see this mentioned in TFA, but it kind of acts like multimodal is totally new to Ollama, which it isn't.

replies(2): >>44001952 #>>44002109 #
1. simonw ◴[] No.44001952[source]
There's a pretty clear explanation of the llama.cpp history here: https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd...

I don't fully understand Ollama's timeline and strategy yet.