(www.sergey.fyi)

1303 points serjester | 1 comments | 05 Feb 25 18:05 UTC | HN request time: 0.247s | source

Show context

Havoc ◴[05 Feb 25 19:03 UTC] No.42953438[source]▶

Been toying with the flash model. Not the top model, but think it'll see plenty use due to the details. Wins on things other than top of benchmark logs

* Generous free tier

* Huge context window

* Lite version feels basically instant

However

* Lite model seems more prone to repeating itself / looping

* Very confusing naming e.g. {model}-latest worked for 1.5 but now its {model}-001? The lite has a date appended, the non-lite does not. Then there is exp and thinking exp...which has a date. wut?

replies(1): >>42953462 #

ai-christianson ◴[05 Feb 25 19:05 UTC] No.42953462[source]▶

>>42953438 #

> * Huge context window

But how well does it actually handle that context window? E.g. a lot of models support 200K context, but the LLM can only really work with ~80K or so of it before it starts to get confused.

replies(5): >>42953514 #>>42953536 #>>42953554 #>>42953762 #>>42955202 #

1. f38zf5vdt ◴[05 Feb 25 19:26 UTC] No.42953762[source]▶

>>42953462 #

It works okay out to roughly 20-40k tokens. Once the window gets larger than that, it degrades significantly. You can needle in the haystack out to that distance, but asking it for multiple things from the document leads to hallucinations for me.

Ironic, but GPT4o works better for me at longer contexts <128k than Gemini 2.0 flash. And out to 1m is just hopeless, even though you can do it.

↑

Ingesting PDFs and why Gemini 2.0 changes everything