←back to thread

182 points BUFU | 2 comments | | HN request time: 0.938s | source
Show context
sgt101 ◴[] No.42075703[source]
I tested the small model with a few images from Clevr. On first blush I am afraid it didn't do very well at all, it got object counts totally wrong and struggled to identify shapes and colours.

Still, it seems to understand what's in the images in general (cones and spheres and cubes), and the fact that it runs on my mac book at all is basically amazing.

replies(1): >>42078898 #
1. EdwardKrayer ◴[] No.42078898[source]
My initial testing was with charts - I've been waiting on local vision models to be good enough to feed technical documents and my initial testing is looking very good. Example:

https://i.imgur.com/1ETREP9.png

replies(1): >>42086317 #
2. sgt101 ◴[] No.42086317[source]
I've tried with some ppt images rather than Clevr ones and it does much better. It can count circles and triangles and differentiates between them quite well. It can recognise the colours of the objects as well.

I think that the faux 3d of clevr images is too much for the model, it's interesting because much smaller pre-transformer specialist models were very good at clevr.