←back to thread

176 points lnyan | 1 comments | | HN request time: 0.26s | source
Show context
Larrikin ◴[] No.42183635[source]
Has anyone found a use for LLAVA yet?

LLAMA can be trusted to summarize and format information, and some of the other models can be OK coding assistances, but when I was showing Ollama off to a friend I struggled to think of anything useful other than a party trick of "yup that's what is in the picture".

Obviously it would be useful to blind people, but the hard part is using it for something where the person could just look at the picture. Possibly could be used on a security camera and combined with a basic keyword alert, but I imagine there's a lot of false positives and false negatives.

replies(1): >>42183818 #
1. ac1spkrbox ◴[] No.42183818[source]
Multimodal models are useful for lots of things! They can accomplish a range a tasks from zero-shot image classification to helping perform Retrieval-Augmented Generation on images. Like many generative model, I find the utility comes not necessarily from outperforming a human, but from scaling a task that a human wouldn't want to do (or won't do cheaply).