(www.anthropic.com)

2127 points bakugo | 2 comments | 24 Feb 25 18:28 UTC | HN request time: 0s | source

Show context

freediver ◴[24 Feb 25 19:57 UTC] No.43164170[source]▶

Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

replies(8): >>43164279 #>>43164282 #>>43164709 #>>43164800 #>>43164997 #>>43165104 #>>43169517 #>>43171532 #

1. guelo ◴[24 Feb 25 21:14 UTC] No.43164997[source]▶

>>43164170 #

How did you chose the 8192 token thinking budget? I've often seen Deepseek R1 use way more than that.

replies(1): >>43173457 #

2. freediver ◴[25 Feb 25 15:51 UTC] No.43173457[source]▶

>>43164997 (TP) #

Arbitrary, and even with this budget it is already more verbose (and slower) overall than all the other thinking models - check tokens and latency in the table.

↑

Claude 3.7 Sonnet and Claude Code