(www.anthropic.com)

2127 points bakugo | 2 comments | 24 Feb 25 18:28 UTC | HN request time: 0s | source

Show context

freediver ◴[24 Feb 25 19:57 UTC] No.43164170[source]▶

Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

replies(8): >>43164279 #>>43164282 #>>43164709 #>>43164800 #>>43164997 #>>43165104 #>>43169517 #>>43171532 #

Squarex ◴[24 Feb 25 20:51 UTC] No.43164800[source]▶

>>43164170 #

I'm surprised that Gemini 2.0 is first now. I remember that Google models were under performing on kagi benchmarks.

replies(2): >>43164959 #>>43165098 #

Workaccount2 ◴[24 Feb 25 21:24 UTC] No.43165098[source]▶

>>43164800 #

Having your own hardware to run LLMs will pay dividends. Despite getting off on the wrong foot, I still believe Google is best positioned to run away with the AI lead, solely because they are not beholden to Nvidia and not stuck with a 3rd party cloud provider. They are the only AI team that is top to bottom in-house.

replies(3): >>43165153 #>>43166317 #>>43168831 #

1. Squarex ◴[24 Feb 25 21:29 UTC] No.43165153[source]▶

>>43165098 #

I've used gemini for it's large context window before. It's a great model. But specifically in this benchmark it has always scored very low. So I wonder what has changed.

replies(1): >>43168518 #

2. SubiculumCode ◴[25 Feb 25 05:31 UTC] No.43168518[source]▶

>>43165153 (TP) #

I don't know, but very recent Gemini models have certainly seemed much more impressive...and became my daily.

↑

Claude 3.7 Sonnet and Claude Code