←back to thread

DeepSeek-v3.1

(api-docs.deepseek.com)
776 points wertyk | 1 comments | | HN request time: 0.207s | source
Show context
hodgehog11 ◴[] No.44977357[source]
For reference, here is the terminal-bench leaderboard:

https://www.tbench.ai/leaderboard

Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.

replies(6): >>44977423 #>>44977655 #>>44977754 #>>44977946 #>>44978395 #>>44978560 #
guluarte ◴[] No.44977946[source]
tbh companies like anthopic, openai, create custom agents for specific benchmarks
replies(2): >>44978101 #>>44979380 #
amelius ◴[] No.44979380[source]
Aren't good benchmarks supposed to be secret?
replies(3): >>44979634 #>>44982470 #>>45056160 #
1. wkat4242 ◴[] No.44979634[source]
This industry is currently burning billions a month. With that much money around I don't think any secrets can exist.