(api-docs.deepseek.com)

776 points wertyk | 1 comments | 21 Aug 25 19:06 UTC | HN request time: 0s | source

Show context

hodgehog11 ◴[21 Aug 25 20:01 UTC] No.44977357[source]▶

For reference, here is the terminal-bench leaderboard:

https://www.tbench.ai/leaderboard

Looks like it doesn't get close to GPT-5, Claude 4, or GLM-4.5, but still does reasonably well compared to other open weight models. Benchmarks are rarely the full story though, so time will tell how good it is in practice.

replies(6): >>44977423 #>>44977655 #>>44977754 #>>44977946 #>>44978395 #>>44978560 #

seunosewa ◴[21 Aug 25 20:08 UTC] No.44977423[source]▶

>>44977357 #

The DeepSeek R1 in that list is the old model that's been replaced. Update: Understood.

replies(1): >>44977719 #

yorwba ◴[21 Aug 25 20:33 UTC] No.44977719[source]▶

>>44977423 #

Yes, and 31.3% is given in the announcement as the performance of the new v3.1, which would put it in sixteenth place.

replies(1): >>44977880 #

1. ◴[21 Aug 25 20:47 UTC] No.44977880{3}[source]▶

>>44977719 #

↑

DeepSeek-v3.1