←back to thread

1303 points serjester | 2 comments | | HN request time: 0.501s | source
Show context
lazypenguin ◴[] No.42953665[source]
I work in fintech and we replaced an OCR vendor with Gemini at work for ingesting some PDFs. After trial and error with different models Gemini won because it was so darn easy to use and it worked with minimal effort. I think one shouldn't underestimate that multi-modal, large context window model in terms of ease-of-use. Ironically this vendor is the best known and most successful vendor for OCR'ing this specific type of PDF but many of our requests failed over to their human-in-the-loop process. Despite it not being their specialization switching to Gemini was a no-brainer after our testing. Processing time went from something like 12 minutes on average to 6s on average, accuracy was like 96% of that of the vendor and price was significantly cheaper. For the 4% inaccuracies a lot of them are things like the text "LLC" handwritten would get OCR'd as "IIC" which I would say is somewhat "fair". We probably could improve our prompt to clean up this data even further. Our prompt is currently very simple: "OCR this PDF into this format as specified by this json schema" and didn't require some fancy "prompt engineering" to contort out a result.

Gemini developer experience was stupidly easy. Easy to add a file "part" to a prompt. Easy to focus on the main problem with weirdly high context window. Multi-modal so it handles a lot of issues for you (PDF image vs. PDF with data), etc. I can recommend it for the use case presented in this blog (ignoring the bounding boxes part)!

replies(33): >>42953680 #>>42953745 #>>42953799 #>>42954088 #>>42954472 #>>42955083 #>>42955470 #>>42955520 #>>42955824 #>>42956650 #>>42956937 #>>42957231 #>>42957551 #>>42957624 #>>42957905 #>>42958152 #>>42958534 #>>42958555 #>>42958869 #>>42959364 #>>42959695 #>>42959887 #>>42960847 #>>42960954 #>>42961030 #>>42961554 #>>42962009 #>>42963981 #>>42964161 #>>42965420 #>>42966080 #>>42989066 #>>43000649 #
1. chickenWing ◴[] No.42966080[source]
It is cheaper now, but I wonder if it will continue to be cheaper when companies like Google and OpenAI decide they want to make a profit off of AI, instead of pouring billions of dollars of investment funds into it. By the time that happens, many of the specialized service providers will be out of business and Google will be free to jack up the price.
replies(1): >>42966138 #
2. kbaker ◴[] No.42966138[source]
I use Claude through OpenRouter (with Aider), and was pretty amazed to see that it routes the requests during the same session almost round-robin through Amazon Bedrock, sometimes through Google Vertex, sometimes through Anthropic themselves, all of course using the same underlying model.

Literally whoever has the cheapest compute.

With the speed that AI models are improving these days, it seems like the 'moat' of a better model is only a few months before it is commoditized and goes to the cheapest provider.