←back to thread

DeepSeek-v3.1-Terminus

(api-docs.deepseek.com)
101 points meetpateltech | 2 comments | | HN request time: 0.001s | source
Show context
sbinnee ◴[] No.45332653[source]
> What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

replies(4): >>45332751 #>>45332752 #>>45333575 #>>45336644 #
IgorPartola ◴[] No.45332751[source]
I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.
replies(3): >>45333280 #>>45333716 #>>45335468 #
Jgoauh ◴[] No.45333716[source]
have you tried https://artificialanalysis.ai/
replies(2): >>45334600 #>>45348957 #
JimDugan ◴[] No.45334600[source]
Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.
replies(1): >>45334853 #
1. IgorPartola ◴[] No.45334853[source]
Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?
replies(1): >>45340887 #
2. esafak ◴[] No.45340887[source]
I believe the score represents the fraction of correct answers, so yes.