(github.com)

990 points pierre | 2 comments | 20 Oct 25 06:26 UTC | HN request time: 0.417s | source

1. singularity2001 ◴[20 Oct 25 09:25 UTC] No.45641792[source]▶

Instead of downloading a specific OCR model how would one fare just downloading the currently best multi-modal foundation model? And what would that be at less than 30 GB?

replies(1): >>45648640 #

2. prats226 ◴[20 Oct 25 20:08 UTC] No.45648640[source]▶

>>45641792 (TP) #

Then you can just download finetuned version of same multi-modal foundation model that's trained on documents?

↑

DeepSeek OCR