Ingesting PDFs and why Gemini 2.0 changes everything

(www.sergey.fyi)

1303 points serjester | 1 comments | 05 Feb 25 18:05 UTC | HN request time: 0s | source

Show context

lazypenguin ◴[05 Feb 25 19:19 UTC] No.42953665[source]▶

I work in fintech and we replaced an OCR vendor with Gemini at work for ingesting some PDFs. After trial and error with different models Gemini won because it was so darn easy to use and it worked with minimal effort. I think one shouldn't underestimate that multi-modal, large context window model in terms of ease-of-use. Ironically this vendor is the best known and most successful vendor for OCR'ing this specific type of PDF but many of our requests failed over to their human-in-the-loop process. Despite it not being their specialization switching to Gemini was a no-brainer after our testing. Processing time went from something like 12 minutes on average to 6s on average, accuracy was like 96% of that of the vendor and price was significantly cheaper. For the 4% inaccuracies a lot of them are things like the text "LLC" handwritten would get OCR'd as "IIC" which I would say is somewhat "fair". We probably could improve our prompt to clean up this data even further. Our prompt is currently very simple: "OCR this PDF into this format as specified by this json schema" and didn't require some fancy "prompt engineering" to contort out a result.

Gemini developer experience was stupidly easy. Easy to add a file "part" to a prompt. Easy to focus on the main problem with weirdly high context window. Multi-modal so it handles a lot of issues for you (PDF image vs. PDF with data), etc. I can recommend it for the use case presented in this blog (ignoring the bounding boxes part)!

replies(33): >>42953680 #>>42953745 #>>42953799 #>>42954088 #>>42954472 #>>42955083 #>>42955470 #>>42955520 #>>42955824 #>>42956650 #>>42956937 #>>42957231 #>>42957551 #>>42957624 #>>42957905 #>>42958152 #>>42958534 #>>42958555 #>>42958869 #>>42959364 #>>42959695 #>>42959887 #>>42960847 #>>42960954 #>>42961030 #>>42961554 #>>42962009 #>>42963981 #>>42964161 #>>42965420 #>>42966080 #>>42989066 #>>43000649 #

makeitdouble ◴[05 Feb 25 23:35 UTC] No.42956937[source]▶

>>42953665 #

> After trial and error with different models

As a mere occasional customer I've been scanning 4 to 5 pages of the same document layout every week in gemini for half a year, and every single week the results were slightly different.

To note the docs are bilingual so it could affect the results, but what stroke me is the lack of consistency, and even with the same model, running it two or three times in a row gives different results.

That's fine for my usage, but that sounds like a nightmare if everytime Google tweaks their model, companies have to reajust their whole process to deal with the discrepancies.

And sticking with the same model for multiple years also sound like a captive situation where you'd have to pay premium for Google to keep it available for your use.

replies(4): >>42957385 #>>42957436 #>>42960435 #>>42962678 #

iandanforth ◴[06 Feb 25 00:32 UTC] No.42957436[source]▶

>>42956937 #

At temperature zero, if you're using the same API/model, this really should not be the case. None of the big players update their APIs without some name / version change.

replies(2): >>42958471 #>>42960778 #

pigscantfly ◴[06 Feb 25 02:58 UTC] No.42958471[source]▶

>>42957436 #

This isn't really true unfortunately -- mixture of experts routing seems to suffer from batch non-determinism. No one has stated publicly exactly why this is, but you can easily replicate the behavior yourself or find bug reports / discussion with a bit of searching. The outcome and observed behavior of the major closed-weight LLM APIs is that a temperature of zero no longer corresponds to deterministic greedy sampling.

replies(1): >>42959211 #

brookst ◴[06 Feb 25 04:51 UTC] No.42959211[source]▶

>>42958471 #

If temperature is zero, and weights are weights, where is the non-deterministic behavior coming from?

replies(5): >>42959232 #>>42959979 #>>42960090 #>>42960416 #>>42961722 #

TeMPOraL ◴[06 Feb 25 07:26 UTC] No.42959979[source]▶

>>42959211 #

Here probably routing would be dominating, but in general, unless I missed all the vendors ditching GPUs and switching to ASICs optimized for fixed precision math, floating points are still non-commutative therefore results are non-deterministic wrt. randomness introduced by parallelising the calculations.

replies(2): >>42960601 #>>42960610 #

zelphirkalt ◴[06 Feb 25 09:24 UTC] No.42960610{5}[source]▶

>>42959979 #

Of course which part of the calculations happens where should also be specifiable and be able to be made deterministicor should not have an effect on the result. A map reduce process' reduce step, merging results from various places also should be able to be made to give reproducible results, regardless of which results arrive first or from where.

Is our tooling too bad for this?

replies(1): >>42961075 #

1. TeMPOraL ◴[06 Feb 25 10:45 UTC] No.42961075{6}[source]▶

>>42960610 #

> Is our tooling too bad for this?

Floating points are fundamentally too bad for this. We use them because they're fast, which usually more than compensates for inaccuracies FP math introduces.

(One, dealing with FP errors is mostly a fixed cost - there's a branch of CS/mathematics specializing in it, producing formally proven recipes for computing specific things in way that minimize or at least give specific bounds on errors. That's work that can be done once, and reused forever. Two, most programmers are oblivious to those issues anyway, and we've learned to live with the bugs :).)

When your parallel map-reduce is just doing matrix additions and multiplications, guaranteeing order of execution comes with serious overhead. For one, you need to have all partial results available together before reducing, so either the reduction step needs to have enough memory to store a copy of all the inputs, or it needs to block the units computing those inputs until all of them finish. Meanwhile, if you drop the order guarantee, then the reduction step just needs one fixed-size accumulator, and every parallel unit computing the inputs is free to go and do something else as soon as it's done.

So the price you pay for deterministic order is either a reduction of throughput or increase in on-chip memory, both of which end up translating to slower and more expensive hardware. The incentives strongly point towards not giving such guarantees if it can be avoided - keep in mind that GPUs have been designed for videogames (and graphics in general), and for this, floating point inaccuracies only matter when they become noticeable to the user.

↑