(api-docs.deepseek.com)

101 points meetpateltech | 5 comments | 22 Sep 25 12:20 UTC | HN request time: 0.917s | source

Show context

sbinnee ◴[22 Sep 25 12:45 UTC] No.45332653[source]▶

>>45332400 (OP) #

> What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

replies(4): >>45332751 #>>45332752 #>>45333575 #>>45336644 #

IgorPartola ◴[22 Sep 25 12:54 UTC] No.45332751[source]▶

>>45332653 #

I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.

replies(3): >>45333280 #>>45333716 #>>45335468 #

1. Jgoauh ◴[22 Sep 25 14:10 UTC] No.45333716[source]▶

>>45332751 #

have you tried https://artificialanalysis.ai/

replies(2): >>45334600 #>>45348957 #

2. JimDugan ◴[22 Sep 25 15:14 UTC] No.45334600[source]▶

>>45333716 (TP) #

Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.

replies(1): >>45334853 #

3. IgorPartola ◴[22 Sep 25 15:31 UTC] No.45334853[source]▶

>>45334600 #

Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?

replies(1): >>45340887 #

4. esafak ◴[22 Sep 25 23:25 UTC] No.45340887{3}[source]▶

>>45334853 #

I believe the score represents the fraction of correct answers, so yes.

5. alexeiz ◴[23 Sep 25 16:06 UTC] No.45348957[source]▶

>>45333716 (TP) #

It says the best "coding index" is held by Grok 4 and Gemini 2.5 Pro. Give me a break. Nobody uses those models for serious coding. It's dominated by Sonnet 4/Opus 4.1 and GPT-5.

↑

DeepSeek-v3.1-Terminus