Ingesting PDFs and why Gemini 2.0 changes everything

1. cedws ◴[05 Feb 25 18:40 UTC] No.42953112[source]▶

90% accuracy +/- 10%? What could that be useful for, that’s awfully low.

replies(7): >>42953249 #>>42953392 #>>42953432 #>>42953490 #>>42953775 #>>42953853 #>>42954057 #

2. lvzw ◴[05 Feb 25 18:48 UTC] No.42953249[source]▶

> accuracy is measured with the Needleman-Wunsch algorithm

> Crucially, we’ve seen very few instances where specific numerical values are actually misread. This suggests that most of Gemini’s “errors” are superficial formatting choices rather than substantive inaccuracies. We attach examples of these failure cases below [1].

> Beyond table parsing, Gemini consistently delivers near-perfect accuracy across all other facets of PDF-to-markdown conversion.

That seems fairly useful to me, no? Maybe not for mission critical applications, but for a lot of use cases, this seems to be good enough. I'm excited to try these prompts on my own later.

3. schainks ◴[05 Feb 25 18:58 UTC] No.42953392[source]▶

>>42953112 (TP) #

This is "good enough" for Banks to use when doing due diligence. You'd be surprised how much noise is in the system with the current state of the art: algorithms/web scrapers and entire buildings of humans in places like India.

replies(2): >>42953442 #>>42953592 #

4. MattDaEskimo ◴[05 Feb 25 19:02 UTC] No.42953432[source]▶

>>42953112 (TP) #

Switching from manual data entry to approval

5. ai-christianson ◴[05 Feb 25 19:03 UTC] No.42953442[source]▶

>>42953392 #

It's certainly pretty useful for discovery/information filtering purposes. I.e. searching for signal in the noise if you have a large dataset.

6. summerlight ◴[05 Feb 25 19:06 UTC] No.42953490[source]▶

>>42953112 (TP) #

I guess 90% is for "benchmark", which is typically tailored to be challenging to parse.

7. jjtheblunt ◴[05 Feb 25 19:14 UTC] No.42953592[source]▶

>>42953392 #

due diligence of this sort?

https://en.wikipedia.org/wiki/Know_your_customer

replies(1): >>43085822 #

8. serjester ◴[05 Feb 25 19:27 UTC] No.42953775[source]▶

>>42953112 (TP) #

Author here — measuring accuracy in table parsing is surprisingly challenging. Subtle, almost imperceptible differences in how a table is parsed may not affect the reader's understanding but can significantly impact benchmark performance. For all practical purposes, I'd say it's near perfect (also keep in mind the benchmark is on very challenging tables).

9. raunakchowdhuri ◴[05 Feb 25 19:32 UTC] No.42953853[source]▶

>>42953112 (TP) #

would encourage you to take a look at some of the real data here! https://huggingface.co/spaces/reducto/rd_table_bench

you'll find that most of the errors here are structural issues with the table or inability to parse some special characters. tables can get crazy!

10. mattnewton ◴[05 Feb 25 19:46 UTC] No.42954057[source]▶

>>42953112 (TP) #

having seen some of these tables, I would guess that's probably above a layperson's score . Some are very complicated or just misleadingly structured.

11. schainks ◴[18 Feb 25 03:35 UTC] No.43085822{3}[source]▶

>>42953592 #

No, I mean services like Bloomberg.

KYC is an API you can pay for now. Works pretty well for the price, IIRC over 10k/month or something.