DeepSeek-v3.1-Terminus

(api-docs.deepseek.com)

Show context

sbinnee ◴[22 Sep 25 12:45 UTC] No.45332653[source]▶

>>45332400 (OP) #

> What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

replies(4): >>45332751 #>>45332752 #>>45333575 #>>45336644 #

1. IgorPartola ◴[22 Sep 25 12:54 UTC] No.45332751[source]▶

>>45332653 #

I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.

replies(3): >>45333280 #>>45333716 #>>45335468 #

2. exe34 ◴[22 Sep 25 13:35 UTC] No.45333280[source]▶

>>45332751 (TP) #

> asking one model to research the others.

that's basically choosing are random with extra steps!

replies(1): >>45333527 #

3. throwup238 ◴[22 Sep 25 13:54 UTC] No.45333527[source]▶

>>45333280 #

Research not spit out the answer based on weights. Just ask Gemini/Claude to do deep research on /r/LocalLLama and HN posts.

4. Jgoauh ◴[22 Sep 25 14:10 UTC] No.45333716[source]▶

>>45332751 (TP) #

have you tried https://artificialanalysis.ai/

replies(2): >>45334600 #>>45348957 #

5. JimDugan ◴[22 Sep 25 15:14 UTC] No.45334600[source]▶

>>45333716 #

Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.

replies(1): >>45334853 #

6. IgorPartola ◴[22 Sep 25 15:31 UTC] No.45334853{3}[source]▶

>>45334600 #

Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?

replies(1): >>45340887 #

7. __mharrison__ ◴[22 Sep 25 16:09 UTC] No.45335468[source]▶

>>45332751 (TP) #

I use Aider heavily and find their benchmark to be pretty good. It is updated relatively frequently (a month ago, which may be an eternity in AI time).

https://aider.chat/docs/leaderboards/

8. esafak ◴[22 Sep 25 23:25 UTC] No.45340887{4}[source]▶

>>45334853 #

I believe the score represents the fraction of correct answers, so yes.

9. alexeiz ◴[23 Sep 25 16:06 UTC] No.45348957[source]▶

>>45333716 #

It says the best "coding index" is held by Grok 4 and Gemini 2.5 Pro. Give me a break. Nobody uses those models for serious coding. It's dominated by Sonnet 4/Opus 4.1 and GPT-5.

↑