(www.anthropic.com)

2127 points bakugo | 2 comments | 24 Feb 25 18:28 UTC | HN request time: 0.677s | source

Show context

freediver ◴[24 Feb 25 19:57 UTC] No.43164170[source]▶

Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

replies(8): >>43164279 #>>43164282 #>>43164709 #>>43164800 #>>43164997 #>>43165104 #>>43169517 #>>43171532 #

1. flixing ◴[24 Feb 25 20:05 UTC] No.43164282[source]▶

>>43164170 #

Do you think kagi is the right Eval tool? If so,why?

replies(1): >>43173474 #

2. freediver ◴[25 Feb 25 15:52 UTC] No.43173474[source]▶

>>43164282 (TP) #

The right eval tool depends on your evaluation task. Kagi LLM benchmark focuses on using LLMS in the context of information retrieval (which is what Kagi does) which includes measuring reasoning and instruction following capabilities.

↑

Claude 3.7 Sonnet and Claude Code