←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 1 comments | | HN request time: 0s | source
Show context
piker ◴[] No.45640676[source]
This looks really cool for prototyping and playing around.

It seems to me though if one is building a modern application that needs to get image segmentation and/or text recognition right there are better APIs available than natural language? It seems like a lot of effort to make a production-scale CV application to weigh it down with all of an LLM’s shortcomings. Not a field I’m familiar with but I would assume that this doesn’t produce state of the art results—that would change the analysis.

replies(2): >>45640692 #>>45642239 #
randomNumber7 ◴[] No.45640692[source]
Imagine you build an image segmentation model for a e.g. specific industrial application.

With this LLM approach you can at least create your training data from the raw images with natural language.

replies(1): >>45640704 #
1. piker ◴[] No.45640704[source]
That does make sense