(qwenlm.github.io)

544 points tosh | 1 comments | 24 Mar 25 18:35 UTC | HN request time: 0.205s | source

Show context

simonw ◴[24 Mar 25 18:52 UTC] No.43464227[source]▶

Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/

replies(5): >>43464375 #>>43464498 #>>43464686 #>>43465383 #>>43467111 #

jschoe ◴[25 Mar 25 01:07 UTC] No.43467111[source]▶

>>43464227 #

And it still can't answer this:

Q: "9.11 and 9.9, which one is larger?"

A: "To determine which number is larger between 9.11 and 9.9, let's compare them step by step. Both numbers have the same whole number part: 9. 9.11 has a 1 in the tenths place. 9.9 has a 9 in the tenths place. Since 9 (from 9.9) is greater than 1 (from 9.11), we can conclude that 9.9 is larger than 9.11."

"Final Answer: 9.9"

I don't think anything else needs to be said here.

replies(17): >>43467119 #>>43467120 #>>43467131 #>>43467132 #>>43467138 #>>43467147 #>>43467148 #>>43467381 #>>43467419 #>>43467451 #>>43467459 #>>43467485 #>>43467500 #>>43467530 #>>43467732 #>>43467781 #>>43468916 #

AuryGlenz ◴[25 Mar 25 01:12 UTC] No.43467147[source]▶

>>43467111 #

I suggest we’ve already now passed what shall be dubbed the jschoe test ;)

replies(2): >>43467458 #>>43468092 #

1. sebastiennight ◴[25 Mar 25 04:19 UTC] No.43468092[source]▶

>>43467147 #

I will now refer to this as the jschoe test in my writing and publications as well!

It's interesting to think that maybe one of the most realistic consequences of reaching artificial superintelligence will be when its answers start wildly diverging from human expectations and we think it's being "increasingly wrong".

↑

Qwen2.5-VL-32B: Smarter and Lighter