Ollama's new engine for multimodal models

They are talking a lot about this new engine - I'd love to see details on how it's actually implemented. Given llama.cpp is a herculean feat, if you are going to claim to have some replacement for it, an example of how you did it would be good!

Based on this part:

> We set out to support a new engine that makes multimodal models first-class citizens, and getting Ollama’s partners to contribute more directly the community - the GGML tensor library.

And from clicking through a github link they had:

https://github.com/ollama/ollama/blob/main/model/models/gemm...

My takeaway is, the GGML library (the thing that is the backbone for llama.cpp) must expose some FFI (foreign function interface) that can be invoked from Go, so in the ollama Go code, they can write their own implementations of model behavior (like Gemma 3) that just calls into the GGML magic. I think I have that right? I would have expected a detail like that to be front and center in the blog post.