(ollama.com)

353 points LorenDB | 1 comments | 16 May 25 01:43 UTC | HN request time: 0.207s | source

Show context

simonw ◴[16 May 25 04:38 UTC] No.44001886[source]▶

The timing on this is a little surprising given llama.cpp just finally got a (hopefully) stable vision feature merged into main: https://simonwillison.net/2025/May/10/llama-cpp-vision/

Presumably Ollama had been working on this for quite a while already - it sounds like they've broken their initial dependency on llama.cpp. Being in charge of their own destiny makes a lot of sense.

replies(1): >>44001924 #

lolinder ◴[16 May 25 04:48 UTC] No.44001924[source]▶

>>44001886 #

Do you know what exactly the difference is with either of these projects adding multimodal support? Both have supported LLaVA for a long time. Did that require special casing that is no longer required?

I'd hoped to see this mentioned in TFA, but it kind of acts like multimodal is totally new to Ollama, which it isn't.

replies(2): >>44001952 #>>44002109 #

1. simonw ◴[16 May 25 04:54 UTC] No.44001952[source]▶

>>44001924 #

There's a pretty clear explanation of the llama.cpp history here: https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd...

I don't fully understand Ollama's timeline and strategy yet.

↑

Ollama's new engine for multimodal models