←back to thread

357 points ingve | 2 comments | | HN request time: 0s | source
Show context
andrethegiant ◴[] No.43974436[source]
Cloudflare’s ai.toMarkdown() function available in Workers AI can handle PDFs pretty easily. Judging from speed alone, it seems they’re parsing the actual content rather than shoving into OCR/LLM.

Shameless plug: I use this under the hood when you prefix any PDF URL with https://pure.md/ to convert to raw text.

replies(4): >>43974514 #>>43974535 #>>43974602 #>>43975027 #
1. _boffin_ ◴[] No.43974535[source]
You’re aware that PDFs are containers that can hold various formats, which can be interlaced in different ways, such as on top, throughout, or in unexpected and unspecified ways that aren’t “parsable,” right?

I would wager that they’re using OCR/LLM in their pipeline.

replies(1): >>43974640 #
2. andrethegiant ◴[] No.43974640[source]
Could be. But their pricing for the conversion is free, which leads me to believe LLMs are not involved.