←back to thread

DeepSeek-v3.1-Terminus

(api-docs.deepseek.com)
101 points meetpateltech | 9 comments | | HN request time: 0s | source | bottom
Show context
sbinnee ◴[] No.45332653[source]
> What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

replies(4): >>45332751 #>>45332752 #>>45333575 #>>45336644 #
1. IgorPartola ◴[] No.45332751[source]
I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.
replies(3): >>45333280 #>>45333716 #>>45335468 #
2. exe34 ◴[] No.45333280[source]
> asking one model to research the others.

that's basically choosing are random with extra steps!

replies(1): >>45333527 #
3. throwup238 ◴[] No.45333527[source]
Research not spit out the answer based on weights. Just ask Gemini/Claude to do deep research on /r/LocalLLama and HN posts.
4. Jgoauh ◴[] No.45333716[source]
have you tried https://artificialanalysis.ai/
replies(2): >>45334600 #>>45348957 #
5. JimDugan ◴[] No.45334600[source]
Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.
replies(1): >>45334853 #
6. IgorPartola ◴[] No.45334853{3}[source]
Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?
replies(1): >>45340887 #
7. __mharrison__ ◴[] No.45335468[source]
I use Aider heavily and find their benchmark to be pretty good. It is updated relatively frequently (a month ago, which may be an eternity in AI time).

https://aider.chat/docs/leaderboards/

8. esafak ◴[] No.45340887{4}[source]
I believe the score represents the fraction of correct answers, so yes.
9. alexeiz ◴[] No.45348957[source]
It says the best "coding index" is held by Grok 4 and Gemini 2.5 Pro. Give me a break. Nobody uses those models for serious coding. It's dominated by Sonnet 4/Opus 4.1 and GPT-5.