←back to thread

DeepSeek-v3.1

(api-docs.deepseek.com)
776 points wertyk | 1 comments | | HN request time: 0s | source
Show context
hodgehog11 ◴[] No.44977357[source]
For reference, here is the terminal-bench leaderboard:

https://www.tbench.ai/leaderboard

Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.

replies(6): >>44977423 #>>44977655 #>>44977754 #>>44977946 #>>44978395 #>>44978560 #
coliveira ◴[] No.44977655[source]
My personal experience is that it produces high quality results.
replies(2): >>44977708 #>>44980748 #
amrrs ◴[] No.44977708[source]
Any example or prompt you use to make this statment?
replies(2): >>44977903 #>>44979268 #
sync ◴[] No.44979268{3}[source]
I'm doing coreference resolution and this model (w/o thinking) performs at the Gemini 2.5-Pro level (w/ thinking_budget set to -1) at a fraction of the cost.
replies(2): >>44979646 #>>44983231 #
1. antman ◴[] No.44983231{4}[source]
Nice point. How did you test for coreference resolution? Specific prompt or dataset?