←back to thread

82 points meetpateltech | 5 comments | | HN request time: 0.199s | source
1. mrklol ◴[] No.45311071[source]
Pricing is really good for this benchmark value. Let’s see how it holds against people testing it.
replies(1): >>45311134 #
2. NitpickLawyer ◴[] No.45311134[source]
If this is sonoma-dusk that was on preview on openrouter, it's pretty cool. I've tested it with some code reverse engineering tasks, and it is at or above gpt5-mini level, while being faster. Works well till about 110-130k tokens tasks, then it gets the case of "getthereitis" and finishes the task even if not all constraints are met (i.e. will say I've solved x/400 tests, the rest can be done later)
replies(3): >>45311329 #>>45311522 #>>45311886 #
3. mrklol ◴[] No.45311329[source]
I can imagine, no model so far could actually use those context sizes…
4. ◴[] No.45311522[source]
5. bn-l ◴[] No.45311886[source]
I was disappointed in its tool calling perf. I didn’t test it extensively though