(www.sergey.fyi)

1303 points serjester | 1 comments | 05 Feb 25 18:05 UTC | HN request time: 0.246s | source

Show context

iudqnolq ◴[06 Feb 25 10:32 UTC] No.42960999[source]▶

In what contexts is 0.84 ± 0.16 actually "nearly perfect"?

1. kym6464 ◴[06 Feb 25 10:48 UTC] No.42961102[source]▶

I think they meant relative to the best other approach, which is Reducto’s given that they are the creators of the benchmark:

Reducto's own model currently outperforms Gemini Flash 2.0 on this benchmark (0.90 vs 0.84). However, as we review the lower-performing examples, most discrepancies turn out to be minor structural variations that would not materially affect an LLM’s understanding of the table.

↑

Ingesting PDFs and why Gemini 2.0 changes everything