←back to thread

357 points ingve | 1 comments | | HN request time: 0.207s | source
Show context
bob1029 ◴[] No.43974521[source]
When accommodating the general case, solving PDF-to-text is approximately equivalent to solving JPEG-to-text.

The only PDF parsing scenario I would consider putting my name on is scraping AcroForm field values from standardized documents.

replies(2): >>43974604 #>>43974634 #
1. kapitalx ◴[] No.43974604[source]
This is approximately the approach we're taking also at https://doctly.ai, add to that a "multiple experts" approach for analyzing the image (for our 'ultra' version), and we get really good results. And we're making it better constantly.