Ingesting PDFs and why Gemini 2.0 changes everything

(www.sergey.fyi)

1303 points serjester | 2 comments | 05 Feb 25 18:05 UTC | HN request time: 0s | source

Show context

freezed8 ◴[05 Feb 25 23:50 UTC] No.42957085[source]▶

(disclaimer I am CEO of llamaindex, which includes LlamaParse)

Nice article! We're actively benchmarking Gemini 2.0 right now and if the results are as good as implied by this article, heck we'll adapt and improve upon it. Our goal (and in fact the reason our parser works so well) is to always use and stay on top of the latest SOTA models and tech :) - we blend LLM/VLM tech with best-in-class heuristic techniques.

Some quick notes: 1. I'm glad that LlamaParse is mentioned in the article, but it's not mentioned in the performance benchmarks. I'm pretty confident that our most accurate modes are at the top of the table benchmark - our stuff is pretty good.

2. There's a long tail of issues beyond just tables - this includes fonts, headers/footers, ability to recognize charts/images/form fields, and as other posters said, the ability to have fine-grained bounding boxes on the source elements. We've optimized our parser to tackle all of these modes, and we need proper benchmarks for that.

3. DIY'ing your own pipeline to run a VLM at scale to parse docs is surprisingly challenging. You need to orchestrate a robust system that can screenshot a bunch of pages at the right resolution (which can be quite slow), tune the prompts, and make sure you're obeying rate limits + can retry on failure.

replies(6): >>42957169 #>>42960910 #>>42961205 #>>42961499 #>>42961979 #>>42962147 #

rahimnathwani ◴[06 Feb 25 00:00 UTC] No.42957169[source]▶

>>42957085 #

Hi Jerry,

How well does llamaparse work on foreign-language documents?

I have pipeline for Arabic-language docs using Azure for OCR and GPT-4o-mini to extract structured information. Would it be worth trying llamaparse to replace part of the pipeline or the whole thing?

replies(1): >>42957341 #

freezed8 ◴[06 Feb 25 00:19 UTC] No.42957341[source]▶

>>42957169 #

yes! we have foreign language support for better OCR on scans. Here's some more details. Docs: https://docs.cloud.llamaindex.ai/llamaparse/features/parsing... Notebook: https://github.com/run-llama/llama_parse/blob/main/examples/...

replies(1): >>42957364 #

1. rahimnathwani ◴[06 Feb 25 00:21 UTC] No.42957364[source]▶

>>42957341 #

What is disable_ocr=True for? Is it for documents that already have a text layer, that you don't want to OCR again?

replies(1): >>42957577 #

2. freezed8 ◴[06 Feb 25 00:51 UTC] No.42957577[source]▶

>>42957364 (TP) #

yeah disable OCR is for documents where you don't need to OCR a scanned image, it'll just parse out the text

it's faster if True

↑