(www.marginalia.nu)

357 points ingve | 1 comments | 13 May 25 15:01 UTC | HN request time: 0.001s | source

Show context

xnx ◴[13 May 25 15:44 UTC] No.43974208[source]▶

Weird that there's no mention of LLMs in this article even though the article is very recent. LLMs haven't solved every OCR/document data extraction problem, but they've dramatically improved the situation.

replies(5): >>43974229 #>>43974325 #>>43974337 #>>43974562 #>>43975686 #

1. constantinum ◴[13 May 25 17:50 UTC] No.43975686[source]▶

>>43974208 #

True indeed, but there are a few problems — hallucinations and trusting the output(validation). More here https://unstract.com/blog/why-llms-struggle-with-unstructure...

↑

PDF to Text, a challenging problem