The hope is to be able to get more multimodal models out soon. I'd like to see if we can get Pixtral and Qwen2.5-vl in relatively soon.
How does this address the security concern of filenames being detected and read when not wanted?
Is there any more specific info available about who (llama.cpp or Ollama) removed what, where? As far as I can see, the server is still part of llama.cpp.
And more generally: Is this the moment when Ollama and Llama part ways?
Still, it seems to understand what's in the images in general (cones and spheres and cubes), and the fact that it runs on my mac book at all is basically amazing.
Ran the 11B yesterday and it worked great.
I think that the faux 3d of clevr images is too much for the model, it's interesting because much smaller pre-transformer specialist models were very good at clevr.