DeepSeek-v3.1 | slacker news

1. rsanek ◴[22 Aug 25 03:30 UTC] No.44980753[source]▶

Looks to be the ~same intelligence as gpt-oss-120B, but about 10x slower and 3x more expensive?

https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

replies(5): >>44981187 #>>44981737 #>>44981789 #>>44982171 #>>44982769 #

2. petesergeant ◴[22 Aug 25 05:09 UTC] No.44981187[source]▶

I don't think you're necessarily wrong, but your source is currently only showing a single provider. Comparing:

https://openrouter.ai/openai/gpt-oss-120b and https://openrouter.ai/deepseek/deepseek-chat-v3.1 for the same providers is probably better, although gpt-oss-120b has been around long enough to have more providers, and presumably for hosters to get comfortable with it / optimize hosting of it.

3. okasaki ◴[22 Aug 25 07:05 UTC] No.44981737[source]▶

>>44980753 (TP) #

My experience is that gpt-oss doesn't know much about obscure topics, so if you're using it for anything except puzzles or coding in popular languages, it won't do well as the bigger models.

It's knowledge seems to be lacking even compared to gpt3.

No idea how you'd benchmark this though.

replies(2): >>44981768 #>>44984205 #

4. easygenes ◴[22 Aug 25 07:13 UTC] No.44981768[source]▶

>>44981737 #

Something I was doing informally that seems very effective is asking for details about smaller cities and towns and lesser points of interest around the world. Bigger models tend to have a much better understanding and knowledge base for the more obscure places.

replies(1): >>44983215 #

5. easygenes ◴[22 Aug 25 07:16 UTC] No.44981789[source]▶

>>44980753 (TP) #

Other benchmark aggregates are less favorable to GPT-OSS-120B: https://arxiv.org/abs/2508.12461

replies(1): >>44982519 #

6. mdp2021 ◴[22 Aug 25 08:34 UTC] No.44982171[source]▶

>>44980753 (TP) #

> same intelligence as gpt-oss-120B

Let's hope not, because gpt-oss-120B can be dramatically moronical. I am guessing the MoE contains some very dumb subnets.

Benchmarks can be a starting point, but you really have to see how the results work for you.

7. petesergeant ◴[22 Aug 25 09:29 UTC] No.44982519[source]▶

>>44981789 #

With all these things, it depends on your own eval suite. gpt-oss-120b works as well as o4-mini over my evals, which means I can run it via OpenRouter on Cerebras where it's SO DAMN FAST and like 1/5th the price of o4-mini.

replies(1): >>44983709 #

8. lenerdenator ◴[22 Aug 25 10:17 UTC] No.44982769[source]▶

>>44980753 (TP) #

Clearly, this is a dark harbinger for Chinese AI supremacy /s

9. scotty79 ◴[22 Aug 25 11:31 UTC] No.44983215{3}[source]▶

>>44981768 #

I would really love if they figured out how to train a model that doesn't have any such knowledge baked it, but knows where to look for it. Maybe even has a clever database for that. Knowing this kind of trivia like this consistently of the top of your head is a sign of deranged mind, artificial or not.

replies(2): >>44983518 #>>44985867 #

10. okasaki ◴[22 Aug 25 12:07 UTC] No.44983518{4}[source]▶

>>44983215 #

Would that work as well? If I ask a big model to write like Shakespeare it just knows intuitively how to do that. If it didn't and had to look up how to do that, I'm not sure it would do a good job.

11. indigodaddy ◴[22 Aug 25 12:27 UTC] No.44983709{3}[source]▶

>>44982519 #

How would you compare gpt-oss-120b to (for coding):

Qwen3-Coder-480B-A35B-Instruct

GLM4.5 Air

Kimi K2

DeepSeek V3 0324 / R1 0528

GPT-5 Mini

Thanks for any feedback!

replies(1): >>44984580 #

12. xadhominemx ◴[22 Aug 25 13:08 UTC] No.44984205[source]▶

>>44981737 #

> My experience is that gpt-oss doesn't know much about obscure topics

That is the point of these small models. Remove the bloat of obscure information (address that with RAG), leaving behind a core “reasoning” skeleton.

replies(1): >>44984641 #

13. petesergeant ◴[22 Aug 25 13:40 UTC] No.44984580{4}[source]▶

>>44983709 #

I’m afraid I don’t use any of those for coding

replies(1): >>44987662 #

14. okasaki ◴[22 Aug 25 13:46 UTC] No.44984641{3}[source]▶

>>44984205 #

Yeah I guess. Just wanted to say the size difference might be accounted for by the model knowing more.

Seems more user-friendly to bake it in.

15. bigmadshoe ◴[22 Aug 25 15:34 UTC] No.44985867{4}[source]▶

>>44983215 #

The problem is that these models can't reason about what they do and do not know, so right now you basically need to tune it to: 1) always look up all trivia, or 2) occasionally look up trivia when it "seems complex" enough.

16. bigyabai ◴[22 Aug 25 18:04 UTC] No.44987662{5}[source]▶

>>44984580 #

You're missing out. GLM 4.5 Air and Qwen3 A3B both blow OSS 120B out of the water in my experience.

replies(1): >>44987765 #

17. indigodaddy ◴[22 Aug 25 18:13 UTC] No.44987765{6}[source]▶

>>44987662 #

Ah good to hear! How about Qwen3-Coder-480B-A35B-Instruct? I believe that is the free Qwen3-coder model on openrouter