Most active commenters

    ←back to thread

    1303 points serjester | 11 comments | | HN request time: 0.423s | source | bottom
    1. cedws ◴[] No.42953112[source]
    90% accuracy +/- 10%? What could that be useful for, that’s awfully low.
    replies(7): >>42953249 #>>42953392 #>>42953432 #>>42953490 #>>42953775 #>>42953853 #>>42954057 #
    2. lvzw ◴[] No.42953249[source]
    > accuracy is measured with the Needleman-Wunsch algorithm

    > Crucially, we’ve seen very few instances where specific numerical values are actually misread. This suggests that most of Gemini’s “errors” are superficial formatting choices rather than substantive inaccuracies. We attach examples of these failure cases below [1].

    > Beyond table parsing, Gemini consistently delivers near-perfect accuracy across all other facets of PDF-to-markdown conversion.

    That seems fairly useful to me, no? Maybe not for mission critical applications, but for a lot of use cases, this seems to be good enough. I'm excited to try these prompts on my own later.

    3. schainks ◴[] No.42953392[source]
    This is "good enough" for Banks to use when doing due diligence. You'd be surprised how much noise is in the system with the current state of the art: algorithms/web scrapers and entire buildings of humans in places like India.
    replies(2): >>42953442 #>>42953592 #
    4. MattDaEskimo ◴[] No.42953432[source]
    Switching from manual data entry to approval
    5. ai-christianson ◴[] No.42953442[source]
    It's certainly pretty useful for discovery/information filtering purposes. I.e. searching for signal in the noise if you have a large dataset.
    6. summerlight ◴[] No.42953490[source]
    I guess 90% is for "benchmark", which is typically tailored to be challenging to parse.
    7. jjtheblunt ◴[] No.42953592[source]
    due diligence of this sort?

    https://en.wikipedia.org/wiki/Know_your_customer

    replies(1): >>43085822 #
    8. serjester ◴[] No.42953775[source]
    Author here — measuring accuracy in table parsing is surprisingly challenging. Subtle, almost imperceptible differences in how a table is parsed may not affect the reader's understanding but can significantly impact benchmark performance. For all practical purposes, I'd say it's near perfect (also keep in mind the benchmark is on very challenging tables).
    9. raunakchowdhuri ◴[] No.42953853[source]
    would encourage you to take a look at some of the real data here! https://huggingface.co/spaces/reducto/rd_table_bench

    you'll find that most of the errors here are structural issues with the table or inability to parse some special characters. tables can get crazy!

    10. mattnewton ◴[] No.42954057[source]
    having seen some of these tables, I would guess that's probably above a layperson's score . Some are very complicated or just misleadingly structured.
    11. schainks ◴[] No.43085822{3}[source]
    No, I mean services like Bloomberg.

    KYC is an API you can pay for now. Works pretty well for the price, IIRC over 10k/month or something.