LLaVA-O1: Let Vision Language Models Reason Step-by-Step

Has anyone found a use for LLAVA yet?

LLAMA can be trusted to summarize and format information, and some of the other models can be OK coding assistances, but when I was showing Ollama off to a friend I struggled to think of anything useful other than a party trick of "yup that's what is in the picture".

Obviously it would be useful to blind people, but the hard part is using it for something where the person could just look at the picture. Possibly could be used on a security camera and combined with a basic keyword alert, but I imagine there's a lot of false positives and false negatives.