←back to thread

2127 points bakugo | 2 comments | | HN request time: 0.677s | source
Show context
freediver ◴[] No.43164170[source]
Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

replies(8): >>43164279 #>>43164282 #>>43164709 #>>43164800 #>>43164997 #>>43165104 #>>43169517 #>>43171532 #
1. flixing ◴[] No.43164282[source]
Do you think kagi is the right Eval tool? If so,why?
replies(1): >>43173474 #
2. freediver ◴[] No.43173474[source]
The right eval tool depends on your evaluation task. Kagi LLM benchmark focuses on using LLMS in the context of information retrieval (which is what Kagi does) which includes measuring reasoning and instruction following capabilities.