←back to thread

544 points tosh | 2 comments | | HN request time: 0.486s | source
Show context
simonw ◴[] No.43464227[source]
Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/
replies(5): >>43464375 #>>43464498 #>>43464686 #>>43465383 #>>43467111 #
echelon ◴[] No.43464498[source]
Pretty soon I won't be using any American models. It'll be a 100% Chinese open source stack.

The foundation model companies are screwed. Only shovel makers (Nvidia, infra companies) and product companies are going to win.

replies(7): >>43464607 #>>43464651 #>>43464792 #>>43466340 #>>43466493 #>>43469085 #>>43469922 #
refulgentis ◴[] No.43464792[source]
I've been waiting since November for 1, just 1*, model other than Claude than can reliably do agentic tool call loops. As long as the Chinese open models are chasing reasoning and benchmark maxxing vs. mid-2024 US private models, I'm very comfortable with somewhat ignoring these models.

(this isn't idle prognostication hinging on my personal hobby horse. I got skin in the game, I'm virtually certain I have the only AI client that is able to reliably do tool calls with open models in an agentic setting. llama.cpp got a massive contribution to make this happen and the big boys who bother, like ollama, are still using a dated json-schema-forcing method that doesn't comport with recent local model releases that can do tool calls. IMHO we're comfortably past a point where products using these models can afford to focus on conversational chatbots, thats cute but a commodity to give away per standard 2010s SV thinking)

* OpenAI's can but are a little less...grounded?...situated? i.e. it can't handle "read this file and edit it to do $X". Same-ish for Gemini, though, sometimes I feel like the only person in the world who actually waits for the experimental models to go GA, as per letter of the law, I shouldn't deploy them until then

replies(3): >>43464831 #>>43472567 #>>43473947 #
throwawaymaths ◴[] No.43464831[source]
is there some reason you cant train a 1b model to just do agentic stuff?
replies(2): >>43464967 #>>43465596 #
1. anon373839 ◴[] No.43465596[source]
The Berkeley Function Calling Leaderboard [1] might be of interest to you. As of now, it looks like Hammer2.1-3b is the strongest model under 7 billion parameters. Its overall score is ~82% of GPT-4o's. There is also Hammer2.1-1.5b at 1.5 billion parameters that is ~76% of GPT-4o.

[1] https://gorilla.cs.berkeley.edu/leaderboard.html

replies(1): >>43465959 #
2. refulgentis ◴[] No.43465959[source]
Worth noting:

- That'll be 1 turn scores: at multiturn, 4o is 3x as good as the 3b

- BFCL is generally turn natural language into an API call, then multiturn will involve making another API call.

- I hope to inspire work towards an open model that can eat the paid models sooner rather than later

- trained quite specifically on an agent loop with tools read_files and edit_file (you'll also probably do at least read_directory and get_shared_directories, search_filenames and search_files_text are good too), bonus points for cli_command

- IMHO, this is much lower hanging-fruit than ex. training an open computer-vision model, so I beseech thee, intrepid ML-understander, to fill this gap and hear your name resound throughout the age