←back to thread

1303 points serjester | 1 comments | | HN request time: 0.217s | source
Show context
iudqnolq ◴[] No.42960999[source]
In what contexts is 0.84 ± 0.16 actually "nearly perfect"?
replies(1): >>42961102 #
1. kym6464 ◴[] No.42961102[source]
I think they meant relative to the best other approach, which is Reducto’s given that they are the creators of the benchmark:

Reducto's own model currently outperforms Gemini Flash 2.0 on this benchmark (0.90 vs 0.84). However, as we review the lower-performing examples, most discrepancies turn out to be minor structural variations that would not materially affect an LLM’s understanding of the table.