Ingesting PDFs and why Gemini 2.0 changes everything

(www.sergey.fyi)

1303 points serjester | 1 comments | 05 Feb 25 18:05 UTC | HN request time: 0s | source

Show context

gapeslape ◴[05 Feb 25 22:31 UTC] No.42956276[source]▶

In my mind, Gemini 2.0 changes everything because of the incredibly long context (2M tokens on some models), while having strong reasoning capabilities.

We are working on compliance solution (https://fx-lex.com) and RAG just doesn’t cut it for our use case. Legislation cannot be chunked if you want the model to reason well about it.

It’s magical to be able to just throw everything into the model. And the best thing is that we automatically benefit from future model improvements along all performance axes.

replies(2): >>42957222 #>>42959708 #

manmal ◴[06 Feb 25 00:06 UTC] No.42957222[source]▶

>>42956276 #

Maybe a dumb question, have you tried fine tuning on the corpus, and then adding a reasoning process (like all those R1 distillations)?

replies(1): >>42960569 #

1. gapeslape ◴[06 Feb 25 09:16 UTC] No.42960569[source]▶

>>42957222 #

We haven't tried that, we might do that in the future.

My intuition - not based on any research - is that recall should be a lot better from in context data vs. weights in the model. For our use case, precise recall is paramount.

↑