(ollama.com)

353 points LorenDB | 1 comments | 16 May 25 01:43 UTC | HN request time: 0.212s | source

Show context

oezi ◴[16 May 25 11:10 UTC] No.44003925[source]▶

I wish multimodal would imply text, image and audio (+potentially video). If a model supports only image generation or image analysis, vision model seems the more appropriate term.

We should aim to distinguish multimodal modals such as Qwen2.5-Omni from Qwen2.5-VL.

In this sense: Ollama's new engine adds vision support.

replies(2): >>44006219 #>>44007313 #

1. ◴[16 May 25 14:53 UTC] No.44006219[source]▶

>>44003925 #

↑

Ollama's new engine for multimodal models