(api-docs.deepseek.com)

776 points wertyk | 1 comments | 21 Aug 25 19:06 UTC | HN request time: 0.207s | source

Show context

hodgehog11 ◴[21 Aug 25 20:01 UTC] No.44977357[source]▶

For reference, here is the terminal-bench leaderboard:

https://www.tbench.ai/leaderboard

Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.

replies(6): >>44977423 #>>44977655 #>>44977754 #>>44977946 #>>44978395 #>>44978560 #

guluarte ◴[21 Aug 25 20:53 UTC] No.44977946[source]▶

>>44977357 #

tbh companies like anthopic, openai, create custom agents for specific benchmarks

replies(2): >>44978101 #>>44979380 #

amelius ◴[21 Aug 25 23:24 UTC] No.44979380[source]▶

>>44977946 #

Aren't good benchmarks supposed to be secret?

replies(3): >>44979634 #>>44982470 #>>45056160 #

1. wkat4242 ◴[21 Aug 25 23:57 UTC] No.44979634[source]▶

>>44979380 #

This industry is currently burning billions a month. With that much money around I don't think any secrets can exist.

↑

DeepSeek-v3.1