←back to thread

DeepSeek-v3.1-Terminus

(api-docs.deepseek.com)
101 points meetpateltech | 5 comments | | HN request time: 0.917s | source
Show context
sbinnee ◴[] No.45332653[source]
> What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

replies(4): >>45332751 #>>45332752 #>>45333575 #>>45336644 #
IgorPartola ◴[] No.45332751[source]
I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.
replies(3): >>45333280 #>>45333716 #>>45335468 #
1. Jgoauh ◴[] No.45333716[source]
have you tried https://artificialanalysis.ai/
replies(2): >>45334600 #>>45348957 #
2. JimDugan ◴[] No.45334600[source]
Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.
replies(1): >>45334853 #
3. IgorPartola ◴[] No.45334853[source]
Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?
replies(1): >>45340887 #
4. esafak ◴[] No.45340887{3}[source]
I believe the score represents the fraction of correct answers, so yes.
5. alexeiz ◴[] No.45348957[source]
It says the best "coding index" is held by Grok 4 and Gemini 2.5 Pro. Give me a break. Nobody uses those models for serious coding. It's dominated by Sonnet 4/Opus 4.1 and GPT-5.