(github.com)

170 points EarlyOom | 2 comments | 20 Feb 25 01:54 UTC | HN request time: 0.438s | source

Show context

jauntywundrkind ◴[20 Feb 25 06:18 UTC] No.43111690[source]▶

I'd really like to play with Qwen2.5-VL at some point, perhaps for reading data-sheets for microchips. Nicely for some applications, it's also very good at reporting position of what it finds, which many ML tools are pretty mediocre at. https://qwenlm.github.io/blog/qwen2.5-vl/

Not really this application, but QvQ for visual reasoning is also impressive. https://qwenlm.github.io/blog/qvq-72b-preview/

Meta has used Qwen as the basis for their Apollo research. https://arxiv.org/abs/2412.10360

replies(1): >>43111708 #

1. fzysingularity ◴[20 Feb 25 06:21 UTC] No.43111708[source]▶

>>43111690 #

Is Qwen2.5-VL on Ollama? Could give it a try with a few of the schemas we have.

We’ve locally tested with Llama 3.2 11B Vision on Ollama: https://github.com/vlm-run/vlmrun-hub/blob/main/tests/benchm...

FWIW I think Ollama structured outputs API is quite buggy compared to the HF transformers variant.

replies(1): >>43141062 #

2. fzysingularity ◴[22 Feb 25 17:32 UTC] No.43141062[source]▶

>>43111708 (TP) #

Just ran them for Qwen2.5-VL: https://github.com/vlm-run/vlmrun-hub/blob/main/tests/benchm...

↑

Run structured extraction on documents/images locally with Ollama and Pydantic