Most active commenters

    ←back to thread

    DeepSeek-v3.1-Terminus

    (api-docs.deepseek.com)
    101 points meetpateltech | 14 comments | | HN request time: 0.747s | source | bottom
    1. sbinnee ◴[] No.45332653[source]
    > What’s improved? Language consistency: fewer CN/EN mix-ups & no more random chars.

    It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?

    replies(4): >>45332751 #>>45332752 #>>45333575 #>>45336644 #
    2. IgorPartola ◴[] No.45332751[source]
    I wish there was some easy resource to keep up with the latest models. The best I have come up with so far is asking one model to research the others. Realistically I want to know latest versions, best use case, performance (in terms of speed) relative to some baseline, and hardware requirements to run it.
    replies(3): >>45333280 #>>45333716 #>>45335468 #
    3. comrade1234 ◴[] No.45332752[source]
    MIT license that lets you run it on your own hardware and make money off of it.
    replies(1): >>45333537 #
    4. exe34 ◴[] No.45333280[source]
    > asking one model to research the others.

    that's basically choosing are random with extra steps!

    replies(1): >>45333527 #
    5. throwup238 ◴[] No.45333527{3}[source]
    Research not spit out the answer based on weights. Just ask Gemini/Claude to do deep research on /r/LocalLLama and HN posts.
    6. coder543 ◴[] No.45333537[source]
    Qwen3 models (including their 235B and 480B models) use the Apache-2.0 license, so it’s not like that’s a big difference here.
    7. coder543 ◴[] No.45333575[source]
    They seem fairly competitive with each other. You would have to benchmark them for your specific use case.
    8. Jgoauh ◴[] No.45333716[source]
    have you tried https://artificialanalysis.ai/
    replies(2): >>45334600 #>>45348957 #
    9. JimDugan ◴[] No.45334600{3}[source]
    Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.
    replies(1): >>45334853 #
    10. IgorPartola ◴[] No.45334853{4}[source]
    Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?
    replies(1): >>45340887 #
    11. __mharrison__ ◴[] No.45335468[source]
    I use Aider heavily and find their benchmark to be pretty good. It is updated relatively frequently (a month ago, which may be an eternity in AI time).

    https://aider.chat/docs/leaderboards/

    12. twotwotwo ◴[] No.45336644[source]
    The fast Cerebras thing got me to try the Qwen3 models. I couldn't get them working all that well: they had trouble using the required output format and following instructions. On the other hand, benchmarks say they should be great, and it sounds like maybe some people use them OK via different tools.

    I'm curious if my experience was unusual (it very much could be!) and I'd be interested to hear from anyone who's used both.

    13. esafak ◴[] No.45340887{5}[source]
    I believe the score represents the fraction of correct answers, so yes.
    14. alexeiz ◴[] No.45348957{3}[source]
    It says the best "coding index" is held by Grok 4 and Gemini 2.5 Pro. Give me a break. Nobody uses those models for serious coding. It's dominated by Sonnet 4/Opus 4.1 and GPT-5.