(ollama.com)

182 points BUFU | 2 comments | 06 Nov 24 21:10 UTC | HN request time: 0.509s | source

Show context

sgt101 ◴[07 Nov 24 11:22 UTC] No.42075703[source]▶

>>42069453 (OP) #

I tested the small model with a few images from Clevr. On first blush I am afraid it didn't do very well at all, it got object counts totally wrong and struggled to identify shapes and colours.

Still, it seems to understand what's in the images in general (cones and spheres and cubes), and the fact that it runs on my mac book at all is basically amazing.

replies(1): >>42078898 #

1. EdwardKrayer ◴[07 Nov 24 17:37 UTC] No.42078898[source]▶

>>42075703 #

My initial testing was with charts - I've been waiting on local vision models to be good enough to feed technical documents and my initial testing is looking very good. Example:

https://i.imgur.com/1ETREP9.png

replies(1): >>42086317 #

2. sgt101 ◴[08 Nov 24 12:24 UTC] No.42086317[source]▶

>>42078898 (TP) #

I've tried with some ppt images rather than Clevr ones and it does much better. It can count circles and triangles and differentiates between them quite well. It can recognise the colours of the objects as well.

I think that the faux 3d of clevr images is too much for the model, it's interesting because much smaller pre-transformer specialist models were very good at clevr.

↑

Ollama 0.4 is released with support for Meta's Llama 3.2 Vision models locally