←back to thread

2127 points bakugo | 2 comments | | HN request time: 0s | source
Show context
freediver ◴[] No.43164170[source]
Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

replies(8): >>43164279 #>>43164282 #>>43164709 #>>43164800 #>>43164997 #>>43165104 #>>43169517 #>>43171532 #
Squarex ◴[] No.43164800[source]
I'm surprised that Gemini 2.0 is first now. I remember that Google models were under performing on kagi benchmarks.
replies(2): >>43164959 #>>43165098 #
Workaccount2 ◴[] No.43165098[source]
Having your own hardware to run LLMs will pay dividends. Despite getting off on the wrong foot, I still believe Google is best positioned to run away with the AI lead, solely because they are not beholden to Nvidia and not stuck with a 3rd party cloud provider. They are the only AI team that is top to bottom in-house.
replies(3): >>43165153 #>>43166317 #>>43168831 #
1. Squarex ◴[] No.43165153[source]
I've used gemini for it's large context window before. It's a great model. But specifically in this benchmark it has always scored very low. So I wonder what has changed.
replies(1): >>43168518 #
2. SubiculumCode ◴[] No.43168518[source]
I don't know, but very recent Gemini models have certainly seemed much more impressive...and became my daily.