Most active commenters

simonw(10)
Imustaskforhelp(7)
(7)
refulgentis(6)
anon373839(5)
manmal(5)
hmottestad(5)
TechDebtDevin(4)
mirekrusin(4)
regularfry(4)

Popular/hot comments

>>43467111 #
>>43464607 #
>>43464981 #
>>43464243 #
>>43464637 #
>>43464375 #
>>43464498 #
>>43464512 #
>>43464396 #
>>43464710 #
>>43466023 #
>>43464227 #
>>43464180 #
>>43464653 #
>>43464688 #
>>43464792 #
>>43465234 #
>>43467168 #

Qwen2.5-VL-32B: Smarter and Lighter

(qwenlm.github.io)

1. jauntywundrkind ◴[24 Mar 25 18:47 UTC] No.43464180[source]▶

Wish I knew better how to estimate what sized video card one needs. HuggingFace link says this is bfloat16, so at least 64GB?

I guess the -7B might run on my 16GB AMD card?

replies(4): >>43464207 #>>43464240 #>>43464303 #>>43464853 #

2. wgd ◴[24 Mar 25 18:50 UTC] No.43464207[source]▶

>>43464180 #

You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that.

Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.

3. simonw ◴[24 Mar 25 18:52 UTC] No.43464227[source]▶

>>43464068 (OP) #

Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/

replies(5): >>43464375 #>>43464498 #>>43464686 #>>43465383 #>>43467111 #

4. xiphias2 ◴[24 Mar 25 18:53 UTC] No.43464240[source]▶

>>43464180 #

I wish they would start producing graphs with quantized version performances as well. What matters is RAM/bandwidth vs performance, not number of parameters.

5. simonw ◴[24 Mar 25 18:53 UTC] No.43464243[source]▶

>>43464068 (OP) #

32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).

replies(9): >>43464289 #>>43464380 #>>43464443 #>>43464588 #>>43464688 #>>43467991 #>>43468940 #>>43469099 #>>43470619 #

6. clear_view ◴[24 Mar 25 18:58 UTC] No.43464289[source]▶

>>43464243 #

32B don't fully fit 16GB of VRAM. Still fine for higher quality answers, worth the extra wait in some cases.

replies(1): >>43464849 #

7. clear_view ◴[24 Mar 25 19:00 UTC] No.43464303[source]▶

>>43464180 #

deepseek-r1:14b/mistral-small:24b/qwen2.5-coder:14b fit 16GB VRAM with fast generation. 32b versions bleed into RAM and take a serious performance hit but still usable.

8. chaosprint ◴[24 Mar 25 19:06 UTC] No.43464375[source]▶

>>43464227 #

it seems that this free version "may use your prompts and completions to train new models"

https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free

do you think this needs attention?

replies(7): >>43464399 #>>43464480 #>>43464512 #>>43464616 #>>43464961 #>>43468548 #>>43470210 #

9. redrove ◴[24 Mar 25 19:07 UTC] No.43464380[source]▶

>>43464243 #

Or quantized on a 4090!

10. gatienboquet ◴[24 Mar 25 19:08 UTC] No.43464396[source]▶

>>43464068 (OP) #

So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.

replies(6): >>43464672 #>>43464706 #>>43464975 #>>43465234 #>>43465549 #>>43472639 #

11. wgd ◴[24 Mar 25 19:08 UTC] No.43464399{3}[source]▶

>>43464375 #

That's typical of the free options on OpenRouter, if you don't want your inputs used for training you use the paid one: https://openrouter.ai/deepseek/deepseek-chat-v3-0324

replies(1): >>43466292 #

12. YetAnotherNick ◴[24 Mar 25 19:14 UTC] No.43464443[source]▶

>>43464243 #

I don't think these models are GPT-4 level. Yes they seem to be on benchmarks, but it has been known that models increasingly use A/B testing in dataset curation and synthesis(using GPT 4 level models) to optimize not just the benchmarks but things which could be benchmarked like academics.

replies(2): >>43464533 #>>43468989 #

13. Arcuru ◴[24 Mar 25 19:17 UTC] No.43464463[source]▶

>>43464068 (OP) #

Does anyone know how making the models multimodal impacts their text capabilities? The article is claiming this achieves good performance on pure text as well, but I'm curious if there is any analysis on how much impact it usually has.

I've seen some people claim it should make the models better at text, but I find that a little difficult to believe without data.

replies(2): >>43467109 #>>43471257 #

14. lysace ◴[24 Mar 25 19:18 UTC] No.43464471[source]▶

>>43464068 (OP) #

To clarify: Qwen is made by Alibaba Cloud.

(It's not mentioned anywhere in the blog post.)

15. ◴[24 Mar 25 19:19 UTC] No.43464480{3}[source]▶

>>43464375 #

16. echelon ◴[24 Mar 25 19:20 UTC] No.43464498[source]▶

>>43464227 #

Pretty soon I won't be using any American models. It'll be a 100% Chinese open source stack.

The foundation model companies are screwed. Only shovel makers (Nvidia, infra companies) and product companies are going to win.

replies(7): >>43464607 #>>43464651 #>>43464792 #>>43466340 #>>43466493 #>>43469085 #>>43469922 #

17. huijzer ◴[24 Mar 25 19:21 UTC] No.43464512{3}[source]▶

>>43464375 #

Since we are on HN here, I can highly recommend open-webui with some OpenAI-compatible provider. I'm running with Deep Infra for more than a year now and am very happy. New models are usually available within one or two days after release. Also have some friends who use the service almost daily.

replies(7): >>43464718 #>>43465081 #>>43466430 #>>43466464 #>>43466949 #>>43469369 #>>43473139 #

18. nodesocket ◴[24 Mar 25 19:22 UTC] No.43464517[source]▶

>>43464068 (OP) #

What is the recommended model to process a RAG of PDF text documents? I've seen some recommendations for Mistral:7b. Looking to run on a consumer pedestrian home PC (ollama) with a Nvidia 4060ti and Ryzen 5700x.

replies(1): >>43464814 #

19. simonw ◴[24 Mar 25 19:24 UTC] No.43464533{3}[source]▶

>>43464443 #

I'm not talking about GPT-4o here - every benchmark I've seen has had the new models from the past ~12 months out-perform the March 2023 GPT-4 model.

To pick just the most popular one, https://lmarena.ai/?leaderboard= has GPT-4-0314 ranked 83rd now.

replies(1): >>43465368 #

20. wetwater ◴[24 Mar 25 19:31 UTC] No.43464588[source]▶

>>43464243 #

I've only recently started looking into running these models locally on my system. I have limited knowledge regarding LLMs and even more limited when it comes to building my own PC.

Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally?

replies(1): >>43464644 #

21. jsheard ◴[24 Mar 25 19:32 UTC] No.43464607{3}[source]▶

>>43464498 #

I still don't get where the money for new open source models is going to come from once setting investor dollars on fire is no longer a viable business model. Does anyone seriously expect companies to keep buying and running thousands of ungodly expensive GPUs, plus whatever they spend on human workers to do labelling/tuning, and then giving away the spoils for free, forever?

replies(12): >>43464649 #>>43464673 #>>43464679 #>>43464701 #>>43464720 #>>43464725 #>>43465054 #>>43465195 #>>43465674 #>>43467099 #>>43470575 #>>43471233 #

22. TechDebtDevin ◴[24 Mar 25 19:33 UTC] No.43464616{3}[source]▶

>>43464375 #

Thats because its a 3rd party API someone is hosting and trying to arb the infra cost or mine training data, or maybe something even more sinister. I stay away from open router API's that aren't served by reputable well known companies, and even then...

23. hmottestad ◴[24 Mar 25 19:36 UTC] No.43464637[source]▶

>>43464068 (OP) #

Just don’t ask it about the tiananmen square massacre or you’ll get a security warning. Even if you rephrase it.

It’ll happily talk about Bloody Sunday.

Probably a great model, but it worries me that it has such restrictions.

Sure OpenAI also has lots of restrictions, but this feels more like straight up censorship since it’ll happily go on about bad things the governments of the west have done.

replies(7): >>43464653 #>>43464682 #>>43464747 #>>43464832 #>>43466474 #>>43466816 #>>43468343 #

24. TechDebtDevin ◴[24 Mar 25 19:36 UTC] No.43464644{3}[source]▶

>>43464588 #

VRAM Required = Number of Parameters (in billions) × Number of Bytes per Parameter × Overhead[0].

[0]: https://twm.me/posts/calculate-vram-requirements-local-llms/

replies(2): >>43464872 #>>43465021 #

25. finnjohnsen2 ◴[24 Mar 25 19:37 UTC] No.43464649{4}[source]▶

>>43464607 #

ads again. somehow. its like a law of nature.

replies(1): >>43464678 #

26. Imustaskforhelp ◴[24 Mar 25 19:37 UTC] No.43464651{3}[source]▶

>>43464498 #

Yes I also believe the same though I only believe in either grok , gemini or claude ai of the western world.

Gemini isn't too special , it's actually just comparable to deepseek / less than deepseek but it is damn fast so maybe forget gemini for true tasks.

Grok / gemini can be used as a deep research model which I think I like ? Grok seems to have just taken the deepseek approach but just scaled it by their hyper massive gpu cluster, I am not sure I think that grok can also be replaced.

What I truly believe in is claude.

I am not sure but claude really feels good for coding especially.

For any other thing I might use something like deepseek / chinese models

I used cerebras.ai and holy moly they are so fast , I used the deepseek 70 b model , it is still something incredibly fast and my time matters so I really like the open source way so that companies like cereberas can focus on what they do best.

I am not sure about nvidia though. Nvidia seems so connected to the western ai that deepseek improvements impact nvidia.

I do hope that nvidia cheapens the price of gpu though I don't think they have much incentive.

27. generalizations ◴[24 Mar 25 19:37 UTC] No.43464653[source]▶

>>43464637 #

Nah, it's great for things that Western models are censored on. The True Hacker will keep an Eastern and Western model available, depending on what they need information on.

replies(3): >>43464743 #>>43464802 #>>43464979 #

28. zamadatix ◴[24 Mar 25 19:39 UTC] No.43464672[source]▶

>>43464396 #

Qwen 3 is coming imminently as well https://github.com/huggingface/transformers/pull/36878 and it feels like Llama 4 should be coming in the next month or so.

That said none of the recent string of releases has done much yet to "smash a wall", they've just met the larger proprietary models where they already were. I'm hoping R2 or the like really changes that by showing ChatGPT 3->3.5 or 3.5->4 level generational jumps are still possible beyond the current state of the art, not just beyond current models of a given size.

replies(1): >>43468250 #

29. ada1981 ◴[24 Mar 25 19:39 UTC] No.43464673{4}[source]▶

>>43464607 #

Money from the Chinese defense budget?

Everyone using these models undercuts US companies.

Eventually China wins.

replies(2): >>43464758 #>>43467134 #

30. api ◴[24 Mar 25 19:40 UTC] No.43464678{5}[source]▶

>>43464649 #

If nationalist propaganda counts as ads, that might already be supporting Chinese models. Ask them about Tiananmen Square.

Any kind of media with zero or near zero copying/distribution costs becomes a deflationary race to the bottom. Someone will eventually release something that's free, and at that point nothing can compete with free unless it's some kind of very specialized offering. Then you run into a the problem the OP described: how do you fund free? Answer: ads. Now the customer is the advertiser, not the user/consumer, which is why most media converges on trash.

replies(1): >>43464740 #

31. mitthrowaway2 ◴[24 Mar 25 19:40 UTC] No.43464679{4}[source]▶

>>43464607 #

Maybe from NVIDIA? "Commoditize your product's complement".

https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

replies(1): >>43464765 #

32. jampa ◴[24 Mar 25 19:40 UTC] No.43464682[source]▶

>>43464637 #

DeepSeek's website seems to be using two models. The one that censors only does so in the online version. Are you saying that censoring happens with this model, even in the offline version?

replies(1): >>43464723 #

33. ilrwbwrkhv ◴[24 Mar 25 19:41 UTC] No.43464686[source]▶

>>43464227 #

Both of them are better than any American models. Both for reasoning, agentic, fine tuning etc.

34. faizshah ◴[24 Mar 25 19:41 UTC] No.43464688[source]▶

>>43464243 #

I just started self hosting as well on my local machine, been using https://lmstudio.ai/ Locally for now.

I think the 32b models are actually good enough that I might stop paying for ChatGPT plus and Claude.

I get around 20 tok/second on my m3 and I can get 100 tok/second on smaller models or quantized. 80-100 tok/second is the best for interactive usage if you go above that you basically can’t read as fast as it generates.

I also really like the QwQ reaoning model, I haven’t gotten around to try out using locally hosted models for Agents and RAG especially coding agents is what im interested in. I feel like 20 tok/second is fine if it’s just running in the background.

Anyways would love to know others experiences, that was mine this weekend. The way it’s going I really dont see a point in paying, I think on-device is the near future and they should just charge a licensing fee like DB provider for enterprise support and updates.

If you were paying $20/mo for ChatGPT 1 year ago, the 32b models are basically at that level but slightly slower and slightly lower quality but useful enough to consider cancelling your subscriptions at this point.

replies(3): >>43464710 #>>43465059 #>>43470007 #

35. theptip ◴[24 Mar 25 19:43 UTC] No.43464701{4}[source]▶

>>43464607 #

Yeah, this is the obvious objection to the doom. Someone has to pay to train the model that all the small ones distill from.

Companies will have to detect and police distilling if they want to keep their moat. Maybe you have to have an enterprise agreement (and arms control waiver) to get GPT-6-large API access.

36. tomdekan ◴[24 Mar 25 19:44 UTC] No.43464706[source]▶

>>43464396 #

Any more info on the new Google model?

37. wetwater ◴[24 Mar 25 19:44 UTC] No.43464710{3}[source]▶

>>43464688 #

Are there any good sources that I can read up on estimiating what would be hardware specs required for 7B, 13B, 32B .. etc size If I need to run them locally? I am grad student on budget but I want to host one locally and trying to build a PC that could run one of these models.

replies(6): >>43464785 #>>43464973 #>>43464999 #>>43465270 #>>43465970 #>>43468258 #

38. unquietwiki ◴[24 Mar 25 19:45 UTC] No.43464718{4}[source]▶

>>43464512 #

I'm using open-webui at home with a couple of different models. gemma2-9b fits in VRAM on a NV 3060 card + performs nicely.

replies(2): >>43465594 #>>43469350 #

39. Imustaskforhelp ◴[24 Mar 25 19:45 UTC] No.43464720{4}[source]▶

>>43464607 #

I think it's market leadership which is just free word of mouth advertising which can then lead to consulting business or maybe they can cheek in some ads in llm directly oh boy you don't know.

Also I have seen that once a open source llm is released to public, though you can access it on any website hosting it, most people would still prefer it to be the one which created the model.

Deepseek released its revenue models and it's crazy good.

And no they didn't have full racks of h100.

Also one more thing. Open source has always had an issue of funding.

Also they are not completely open source, they are just open weights, yes you can fine tune them but from my limited knowledge, there is some limitations of fine tuning so owning that training data proprietary also helps fund my previous idea of consulting other ai.

Yes it's not a much profitable venture,imo it's just a decently profitable venture, but the current hype around ai is making it lucrative for companies.

Also I think this might be a winner takes all market which increases competition but in a healthy way.

What deepseek did with releasing the open source model and then going out of their way to release some other open source projects which themselves could've been worth a few millions (bycloud said it), helps innovate ai in general.

replies(1): >>43468843 #

40. hmottestad ◴[24 Mar 25 19:46 UTC] No.43464723{3}[source]▶

>>43464682 #

I tried the R1 distill of llama 8B, which did refuse direct questions about the massacre.

Haven’t tried this new model locally, but I agree with you that it looks like there is a secondary censorship going on. If I ask it to list the 10 worst catastrophes of recent Chinese history with Thinking enabled then it’ll actually think about the massacre. Gets blocked very quickly, but it doesn’t look like the thinking is particularly censored.

41. zamadatix ◴[24 Mar 25 19:46 UTC] No.43464725{4}[source]▶

>>43464607 #

Once setting investment dollars on fire is no longer viable it'll probably be because scaling died anyways so what's the rush to have a dozen new frontier models each year.

42. Imustaskforhelp ◴[24 Mar 25 19:48 UTC] No.43464740{6}[source]▶

>>43464678 #

These ads can also have ads blockers though.

Perplexity released the deepseek r1 1331? ( I am not sure I forgot) It basically removes chinese censorships / yes you can ask it about the tiananmen square.

I think the next iteration of these ai model ads would be sneaky which might be hard to remove

Though it's funny you comment about chinese censorship yet american censorship is fine lol

replies(2): >>43464820 #>>43466389 #

43. hmottestad ◴[24 Mar 25 19:48 UTC] No.43464743{3}[source]▶

>>43464653 #

I tried to ask it about Java exploits that would allow me to gain RCE, but it refused just as most western models do.

That was the only thing I could think to ask really. Do you have a better example maybe?

replies(1): >>43467640 #

44. asadm ◴[24 Mar 25 19:50 UTC] No.43464747[source]▶

>>43464637 #

a) nobody, in production, asks those questions b) chatgpt is similarly biased on israel/palestine issue. Try making it agree that there is a genocide ongoing or on Palestinians right to defend themselves.

45. Imustaskforhelp ◴[24 Mar 25 19:52 UTC] No.43464758{5}[source]▶

>>43464673 #

And wez the end user get open source models.

Also china doesn't have access to that many gpus because of the chips act.

And i hate it , i hate it when america sounds more communist than china who open sources their stuff because free markets.

I actually think that more countries need to invest into AI and not companies wanting profit.

This could be the decision that can impact the next century.

replies(2): >>43464902 #>>43465101 #

46. akra ◴[24 Mar 25 19:53 UTC] No.43464765{5}[source]▶

>>43464679 #

This is the reason IMO. Fundamentally China right now is better at manufacturing (e.g. robotics). AI is the complement to this - AI increases the demand for tech manufactured goods. Whereas America is in the opposite position w.r.t which side is their advantage (i.e. the software). AI for China is an enabler into a potentially bigger market which is robots/manufacturing/etc.

Commoditizing the AI/intelligence part means that the main advantage isn't the bits - its the atoms. Physical dexterity, social skills and manufacturing skills will gain more of a comparative advantage vs intelligence work in the future as a result - AI makes the old economy new again in the long term. It also lowers the value of AI investments in that they no longer can command first mover/monopoly like pricing for what is a very large capex cost undermining US investment in what is their advantage. As long as it is strategic, it doesn't necessarily need to be economic on its own.

replies(2): >>43466863 #>>43471392 #

47. disgruntledphd2 ◴[24 Mar 25 19:54 UTC] No.43464785{4}[source]▶

>>43464710 #

MacBook with 64gb RAM will probably be the easiest. As a bonus, you can train pytorch models on the built in GPU.

It's really frustrating that I can't just write off Apple as evil monopolists when they put out hardware like this.

48. refulgentis ◴[24 Mar 25 19:54 UTC] No.43464792{3}[source]▶

>>43464498 #

I've been waiting since November for 1, just 1*, model other than Claude than can reliably do agentic tool call loops. As long as the Chinese open models are chasing reasoning and benchmark maxxing vs. mid-2024 US private models, I'm very comfortable with somewhat ignoring these models.

(this isn't idle prognostication hinging on my personal hobby horse. I got skin in the game, I'm virtually certain I have the only AI client that is able to reliably do tool calls with open models in an agentic setting. llama.cpp got a massive contribution to make this happen and the big boys who bother, like ollama, are still using a dated json-schema-forcing method that doesn't comport with recent local model releases that can do tool calls. IMHO we're comfortably past a point where products using these models can afford to focus on conversational chatbots, thats cute but a commodity to give away per standard 2010s SV thinking)

* OpenAI's can but are a little less...grounded?...situated? i.e. it can't handle "read this file and edit it to do $X". Same-ish for Gemini, though, sometimes I feel like the only person in the world who actually waits for the experimental models to go GA, as per letter of the law, I shouldn't deploy them until then

replies(3): >>43464831 #>>43472567 #>>43473947 #

49. theturtletalks ◴[24 Mar 25 19:56 UTC] No.43464802{3}[source]▶

>>43464653 #

Wouldn’t they just run R1 locally and not have any censorship at all? The model isn’t censored at its core, it’s censored through the system prompt. Perplexity and Huggingface have their own versions of R1 that is not censored.

replies(1): >>43464929 #

50. unquietwiki ◴[24 Mar 25 19:57 UTC] No.43464814[source]▶

>>43464517 #

Apparently there are two versions of the 4060Ti, with 8GB and 16GB of VRAM respectively. I've got an 8GB 3060 that runs gemma2:9b nicely, and that will parse PDF files; gemma3:4b also seems to analyze PDFs decently.

replies(1): >>43464986 #

51. eMPee584 ◴[24 Mar 25 19:58 UTC] No.43464820{7}[source]▶

>>43464740 #

XAI to the rescue!!1!

... (no, not the unintelligible one - the xplainable one)

52. throwawaymaths ◴[24 Mar 25 19:59 UTC] No.43464831{4}[source]▶

>>43464792 #

is there some reason you cant train a 1b model to just do agentic stuff?

replies(2): >>43464967 #>>43465596 #

53. BoorishBears ◴[24 Mar 25 19:59 UTC] No.43464832[source]▶

>>43464637 #

Daily reminder that all commerical LLMs are going to align with the governments their corporations exist under.

https://imgur.com/a/censorship-much-CBxXOgt

It's not even nefarious: they don't want the model spewing out content that will get them in trouble in the most general sense. It just so happens most governments have things that will get you in trouble.

The US is very obsessed with voter manipulation these days, so OpenAI and Anthropic's models are extra sensitive if the wording implies they're being used for that.

China doesn't like talking about past or ongoing human rights violations, so their models will be extra sensitive about that.

54. abraxas ◴[24 Mar 25 20:01 UTC] No.43464849{3}[source]▶

>>43464289 #

Would a 40GB A6000 fully accommodate a 32B model? I assume an fp16 quantization is still necessary?

replies(2): >>43465037 #>>43466059 #

55. zamadatix ◴[24 Mar 25 20:01 UTC] No.43464853[source]▶

>>43464180 #

https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calcul...

That will help you quickly calculate the model VRAM usage as well as the VRAM usage of the context length you want to use. You can put "Qwen/Qwen2.5-VL-32B-Instruct" in the "Model (unquantized)" field. Funnily enough the calculator lacks the option to see without quantizing the model, usually because nobody worried about VRAM bothers running >8 bit quants.

replies(1): >>43465510 #

56. wetwater ◴[24 Mar 25 20:04 UTC] No.43464872{4}[source]▶

>>43464644 #

Thats neat! thanks

57. greenavocado ◴[24 Mar 25 20:07 UTC] No.43464902{6}[source]▶

>>43464758 #

If only you knew how many terawatt hours were burned on biasing models to prevent them from becoming racist

replies(1): >>43465028 #

58. hmottestad ◴[24 Mar 25 20:10 UTC] No.43464929{4}[source]▶

>>43464802 #

I tried R1 through Kagi and it’s similarly censored. Even the distill of llama running on Groq is censored.

replies(1): >>43465527 #

59. behnamoh ◴[24 Mar 25 20:13 UTC] No.43464961{3}[source]▶

>>43464375 #

good grief! people are okay with it when OpenAI and Google do it, but as soon as open source providers do it, people get defensive about it...

replies(2): >>43465002 #>>43465838 #

60. refulgentis ◴[24 Mar 25 20:13 UTC] No.43464967{5}[source]▶

>>43464831 #

They're real squished for space, more than I expected :/ good illustration here, Qwen2.5-1.5B trained to reason, i.e. the name it is released under is "DeepSeek R1 1.5B". https://imgur.com/a/F3w5ymp 1st prompt was "What is 1048576^0.05", it answered, then I said "Hi", then...well...

Fwiw, Claude Sonnet 3.5 100% had some sort of agentic loop x precise file editing trained into it. Wasn't obvious to me until I added a MCP file server to my client, and still isn't well-understood outside a few.

I'm not sure on-device models will be able to handle it any time soon because it relies on just letting it read the whole effing file.

Seperately...

I say I don't understand why no other model is close, but it makes sense. OpenAI has been focused on reasoning, Mistral, I assume is GPU-starved, and Google...well, I used to work there, so I have to stop myself from going on and on. Let's just say I assume that there wouldn't be enough Consensus Built™ to do something "scary" and "experimental" like train that stuff in.

This also isn't going so hot for Sonnet IMHO.

There's vague displeasure and assumptions it "changed" the last week, but, AFAICT the real problem is that the reasoning stuff isn't as "trained in" as, say, OpenAI's.

This'd be a good thing except you see all kinds of whacky behavior.

One of my simple "read file and edit" queries yesterday did about 60 pages worth of thinking, and the thinking contained 130+ separate tool calls that weren't actually called, so it was just wandering around in the wilderness, reacting to hallucinated responses it never actually got.

Which plays into another one of my hobbyhorses, chat is a "hack" on top of an LLM. Great. So is reasoning, especially in the way Anthropic implemented it. At what point are the abstractions too much, so much that it's unreliable? 3.7 Sonnet may be answering that, because when it fails, all that thinking looks like the agentic loop cooked into Sonnet 3.5. So maybe it's altogether too much to have chat, reasoning, and fully reliable agentic loops...

replies(1): >>43466531 #

61. coder543 ◴[24 Mar 25 20:14 UTC] No.43464973{4}[source]▶

>>43464710 #

"B" just means "billion". A 7B model has 7 billion parameters. Most models are trained in fp16, so each parameter takes two bytes at full precision. Therefore, 7B = 14GB of memory. You can easily quantize models to 8 bits per parameter with very little quality loss, so then 7B = 7GB of memory. With more quality loss (making the model dumber), you can quantize to 4 bits per parameter, so 7B = 3.5GB of memory. There are ways to quantize at other levels too, anywhere from under 2 bits per parameter up to 6 bits per parameter are common.

There is additional memory used for context / KV cache. So, if you use a large context window for a model, you will need to factor in several additional gigabytes for that, but it is much harder to provide a rule of thumb for that overhead. Most of the time, the overhead is significantly less than the size of the model, so not 2x or anything. (The size of the context window is related to the amount of text/images that you can have in a conversation before the LLM begins forgetting the earlier parts of the conversation.)

The most important thing for local LLM performance is typically memory bandwidth. This is why GPUs are so much faster for LLM inference than CPUs, since GPU VRAM is many times the speed of CPU RAM. Apple Silicon offers rather decent memory bandwidth, which makes the performance fit somewhere between a typical Intel/AMD CPU and a typical GPU. Apple Silicon is definitely not as fast as a discrete GPU with the same amount of VRAM.

That's about all you need to know to get started. There are obviously nuances and exceptions that apply in certain situations.

A 32B model at 5 bits per parameter will comfortably fit onto a 24GB GPU and provide decent speed, as long as the context window isn't set to a huge value.

replies(2): >>43466151 #>>43467190 #

62. behnamoh ◴[24 Mar 25 20:15 UTC] No.43464975[source]▶

>>43464396 #

Google's announcements are mostly vaporware anyway. Btw, where is Gemini Ultra 1? how about Gemini Ultra 2?

replies(2): >>43465070 #>>43468100 #

63. rsoto2 ◴[24 Mar 25 20:15 UTC] No.43464979{3}[source]▶

>>43464653 #

a lot of the safety around models seems to be implemented in the browser. Underneath the models seem pretty easy to fool/jailbreak.

64. 101008 ◴[24 Mar 25 20:15 UTC] No.43464981[source]▶

>>43464068 (OP) #

Silly question: how can OpenAI, Claude and all, have a valuation so large considering all the open source models? Not saying they will disappear or be tiny (closed models), but why so so so valuable?

replies(10): >>43465120 #>>43465472 #>>43465899 #>>43465934 #>>43466072 #>>43466079 #>>43467117 #>>43467174 #>>43467244 #>>43468107 #

65. nodesocket ◴[24 Mar 25 20:16 UTC] No.43464986{3}[source]▶

>>43464814 #

I also have the 8GB 4060ti variant. Want to upgrade to a 4070 super, but prices on them are still ridiculous. Could be had for $599 a handful of months ago, now on ebay going for $750 plus.

Thanks for the recommendations. I'll give gemma3:12b a try and if needed go down to gemma:4b.

replies(1): >>43466291 #

66. faizshah ◴[24 Mar 25 20:17 UTC] No.43464999{4}[source]▶

>>43464710 #

Go to r/LocalLLAMA they have the most info. There’s also lots of good YouTube channels who have done benchmarks on Mac minis for this (another good value one with student discount).

Since you’re a student most of the providers/clouds offer student credits and you can also get loads of credits from hackathons.

67. chaosprint ◴[24 Mar 25 20:18 UTC] No.43465002{4}[source]▶

>>43464961 #

no. it's nothing to do with deepseek. it's openrouter and providers there

68. admiralrohan ◴[24 Mar 25 20:18 UTC] No.43465007[source]▶

>>43464068 (OP) #

Any update on the privacy concerns?

69. manmal ◴[24 Mar 25 20:19 UTC] No.43465021{4}[source]▶

>>43464644 #

Don’t forget to add a lot of extra space if you want a usable context size.

replies(1): >>43465898 #

70. Imustaskforhelp ◴[24 Mar 25 20:21 UTC] No.43465028{7}[source]▶

>>43464902 #

To be honest, maybe I am going off topic but I wish for the level of innovation in the ai industry in the energy industry.

It feels as an outsider that very little progress is made on the energy issue. I genuinely think that ai can be accelerated so so much more if energy could be more cheap / green

replies(1): >>43468369 #

71. manmal ◴[24 Mar 25 20:22 UTC] No.43465037{4}[source]▶

>>43464849 #

At FP16 you‘d need 64GB just for the weights, and it‘d be 2x as slow as a Q8 version, likely with little improvement. You‘ll also need space for attention and context etc, so 80-100GB (or even more) VRAM would be better.

Many people „just“ use 4x consumer GPUs like the 3090 (24GB each) which scales well. They’d probably buy a mining rig, EPYC CPU, Mainboard with sufficient PCIe lanes, PCIe risers, 1600W PSU (might need to limit the GPUs to 300W), and 128GB RAM. Depending what you pay for the GPUs that‘ll be 3.5-4.5k

replies(2): >>43465327 #>>43465589 #

72. colechristensen ◴[24 Mar 25 20:25 UTC] No.43465054{4}[source]▶

>>43464607 #

I think the only people who will ever make money are the shovel makers, the models will always be free because you’ll just get open source models chasing the paid ones and never being all that far behind, especially when this S curve growth phase slows down.

73. pixelHD ◴[24 Mar 25 20:26 UTC] No.43465059{3}[source]▶

>>43464688 #

what spec is your local mac?

74. karmasimida ◴[24 Mar 25 20:27 UTC] No.43465070{3}[source]▶

>>43464975 #

It is already on the LLM arena right, codename nebula? But you are right they can fuck up their releases royally.

75. l72 ◴[24 Mar 25 20:28 UTC] No.43465081{4}[source]▶

>>43464512 #

I too run openweb-ui locally and use deepinfra.com as my backend. It has been working very well, and I am quite happy with deepinfra's pricing and privacy policy.

I have set up the same thing at work for my colleagues, and they find it better than openai for their tasks.

replies(1): >>43468529 #

76. bee_rider ◴[24 Mar 25 20:30 UTC] No.43465101{6}[source]▶

>>43464758 #

China has allowed quite a bit of market liberalism, so it isn’t that surprising if their AI stuff is responding to the market.

But, I don’t really see the connection on the flip side. Why should proprietary AI be associated with communism? If anything I guess a communist handling of AI would also be to share the model.

replies(1): >>43468807 #

77. barbarr ◴[24 Mar 25 20:32 UTC] No.43465120[source]▶

>>43464981 #

It's user base and brand. Just like with Pepsi and Coca Cola. There's a reason OpenAI ran a Super Bowl ad.

replies(2): >>43465361 #>>43465895 #

78. natch ◴[24 Mar 25 20:40 UTC] No.43465195{4}[source]▶

>>43464607 #

Many sources, Chinese government could be one.

79. OsrsNeedsf2P ◴[24 Mar 25 20:45 UTC] No.43465234[source]▶

>>43464396 #

> We haven't hit the wall yet.

The models are iterative improvements, but I haven't seen night and day differences since GPT3 and 3.5

replies(3): >>43465478 #>>43467288 #>>43468261 #

80. p_l ◴[24 Mar 25 20:48 UTC] No.43465270{4}[source]▶

>>43464710 #

Generally, unquantized - double the number and that's the amount of VRAM in GB you need + some extra, because most models use fp16 weights so it's 2 bytes per parameter -> 32B parameters = 64GB

typical quantization to 4bit will cut 32B model into 16GB of weights plus some of the runtime data, which makes it possibly usable (if slow) on 16GB GPU. You can sometimes viably use smaller quantizations, which will reduce memory use even more.

replies(1): >>43470023 #

81. postalrat ◴[24 Mar 25 20:55 UTC] No.43465327{5}[source]▶

>>43465037 #

I haven't found a good case/risers/etc I really like. Most the miner stuff wasn't made for PCIe 16x.

replies(1): >>43466141 #

82. 101008 ◴[24 Mar 25 21:00 UTC] No.43465361{3}[source]▶

>>43465120 #

Yeah but cheaper alternatives (and open source and local ones) it would be super easy for most of the customers to migrate to a different provider. I am not saying they don't provide any value, but it's like paid software vs open source alternative. Open source alternative ends up imposing, especially among tech people.

83. th0ma5 ◴[24 Mar 25 21:01 UTC] No.43465368{4}[source]▶

>>43464533 #

How have you been able to tie benchmark results to better results?

replies(1): >>43465877 #

84. th0ma5 ◴[24 Mar 25 21:03 UTC] No.43465383[source]▶

>>43464227 #

Why do you keep promoting your blog on every LLM post?

replies(2): >>43465682 #>>43465856 #

85. ggregoire ◴[24 Mar 25 21:11 UTC] No.43465460[source]▶

>>43464068 (OP) #

We were using Llama vision 3.2 a few months back and were very frustrated with it (both in term of speed and results quality). Some day we were looking for alternatives on Hugging Face and eventually stumbled upon Qwen. The difference in accuracy and speed absolutely blew our mind. We ask it to find something in an image and we get a response in like half a second with a 4090 and it's most of the time correct. What's even more mind blowing is that when we ask it to extract any entity name from the image, and the entity name is truncated, it gives us the complete name without even having to ask for it (e.g. "Coca-C" is barely visible in the background, it will return "Coca-Cola" on its own). And it does it with entities not as well known as Coca-Cola, and with entities only known in some very specific regions too. Haven't looked back to Llama or any other vision models since we tried Qwen.

replies(2): >>43469666 #>>43469677 #

86. seydor ◴[24 Mar 25 21:13 UTC] No.43465472[source]▶

>>43464981 #

People cannot normally invest in their competitors.

It's not unlikely that chinese products may be banned / tarriff'd

replies(1): >>43465544 #

87. anon373839 ◴[24 Mar 25 21:14 UTC] No.43465478{3}[source]▶

>>43465234 #

Yeah. Scaling up pretraining and huge models appears to be done. But I think we're still advancing the frontier in the other direction -- i.e., how much capability and knowledge can we cram into smaller and smaller models?

88. azinman2 ◴[24 Mar 25 21:20 UTC] No.43465510{3}[source]▶

>>43464853 #

Except when it comes to deepseek

replies(1): >>43466518 #

89. theturtletalks ◴[24 Mar 25 21:22 UTC] No.43465527{5}[source]▶

>>43464929 #

Kagi may be using the official DeepSeek API and not hosting the model itself. There is work being done to make it completely uncensored:

https://github.com/huggingface/open-r1

https://ollama.com/huihui_ai/deepseek-r1-abliterated

I was mistaken though, it is more than just a system prompt causing the censorship.

replies(1): >>43469442 #

90. FreakyT ◴[24 Mar 25 21:24 UTC] No.43465544{3}[source]▶

>>43465472 #

There are non-Chinese open LLMs (Mistral, LLama, etc), so I don't think that explains it.

91. nwienert ◴[24 Mar 25 21:25 UTC] No.43465549[source]▶

>>43464396 #

We've slid into the upper S curve though.

92. abraxas ◴[24 Mar 25 21:30 UTC] No.43465589{5}[source]▶

>>43465037 #

would it be better for energy efficiency and overall performance to use workstation cards like A5000 or A4000? Those can be found on eBay.

replies(1): >>43466176 #

93. zakki ◴[24 Mar 25 21:30 UTC] No.43465594{5}[source]▶

>>43464718 #

What is the memory of your NV3060? 8GB?

replies(1): >>43466243 #

94. anon373839 ◴[24 Mar 25 21:31 UTC] No.43465596{5}[source]▶

>>43464831 #

The Berkeley Function Calling Leaderboard [1] might be of interest to you. As of now, it looks like Hammer2.1-3b is the strongest model under 7 billion parameters. Its overall score is ~82% of GPT-4o's. There is also Hammer2.1-1.5b at 1.5 billion parameters that is ~76% of GPT-4o.

[1] https://gorilla.cs.berkeley.edu/leaderboard.html

replies(1): >>43465959 #

95. pants2 ◴[24 Mar 25 21:40 UTC] No.43465674{4}[source]▶

>>43464607 #

There are lots of open-source projects that took many millions of dollars to create. Kubernetes, React, Postgres, Chromium, etc. etc.

This has clearly been part of a viable business model for a long time. Why should LLM models be any different?

replies(1): >>43467065 #

96. rudedogg ◴[24 Mar 25 21:40 UTC] No.43465682{3}[source]▶

>>43465383 #

I think they didn’t want to rewrite their post. It’s more substantial and researched than any comment here, and all their posts are full of information. I think they should get a pass, and calling it self-promotion is a stretch.

97. londons_explore ◴[24 Mar 25 21:57 UTC] No.43465838{4}[source]▶

>>43464961 #

I trust big companies far more with my data than small ones.

Big companies have so much data they won't be having a human look at mine specifically. Some small place probably has the engineer looking at my logs as user #4.

Also, big companies have security teams whose job is securing the data, and it won't be going over some unencrypted link to cloudflare because OP was too lazy to set up Https certs.

replies(1): >>43466206 #

98. simonw ◴[24 Mar 25 22:00 UTC] No.43465856{3}[source]▶

>>43465383 #

Because I want people to read it. I only promote it if I think it's useful and relevant.

replies(2): >>43466701 #>>43474199 #

99. tomcam ◴[24 Mar 25 22:01 UTC] No.43465873[source]▶

>>43464068 (OP) #

I wish I were smarter and lighter

tbh I’d settle for just lighter

100. simonw ◴[24 Mar 25 22:02 UTC] No.43465877{5}[source]▶

>>43465368 #

Vibes and intuition. Not much more than that.

replies(1): >>43474204 #

101. TechDebtDevin ◴[24 Mar 25 22:04 UTC] No.43465895{3}[source]▶

>>43465120 #

Most "normies" I know only recognize ChatGPT with AI, so for sure, brand recognition is the only thing that matters.

102. TechDebtDevin ◴[24 Mar 25 22:05 UTC] No.43465898{5}[source]▶

>>43465021 #

Wouldn't that be your overhead var

103. neither_color ◴[24 Mar 25 22:05 UTC] No.43465899[source]▶

>>43464981 #

ChatGPT's o1 is still really good and the free options are not compelling enough to switch if you've been using it for a while. They've positioned themselves to be a good mainstream default.

104. wg0 ◴[24 Mar 25 22:06 UTC] No.43465907[source]▶

>>43464068 (OP) #

For now, motivation is to win the AI race. Not sure how or who would keep these models updated in the future.

I mean all of them.

105. rafaelmn ◴[24 Mar 25 22:09 UTC] No.43465934[source]▶

>>43464981 #

Because what would seem like a tiny difference in those benchmark graphs is the difference between worth paying for and complete waste of time in practice

106. refulgentis ◴[24 Mar 25 22:14 UTC] No.43465959{6}[source]▶

>>43465596 #

Worth noting:

- That'll be 1 turn scores: at multiturn, 4o is 3x as good as the 3b

- BFCL is generally turn natural language into an API call, then multiturn will involve making another API call.

- I hope to inspire work towards an open model that can eat the paid models sooner rather than later

- trained quite specifically on an agent loop with tools read_files and edit_file (you'll also probably do at least read_directory and get_shared_directories, search_filenames and search_files_text are good too), bonus points for cli_command

- IMHO, this is much lower hanging-fruit than ex. training an open computer-vision model, so I beseech thee, intrepid ML-understander, to fill this gap and hear your name resound throughout the age

107. randomNumber7 ◴[24 Mar 25 22:16 UTC] No.43465970{4}[source]▶

>>43464710 #

Yes. You multiply the number of parameters with the number of bytes per parameter and compare it with the amount of GPU memory (or CPU RAM) you have.

108. i_love_retros ◴[24 Mar 25 22:22 UTC] No.43466023[source]▶

>>43464068 (OP) #

Any security risks running these Chinese LLMs on my local computer?

replies(5): >>43466365 #>>43466369 #>>43466411 #>>43468902 #>>43469724 #

109. simonw ◴[24 Mar 25 22:24 UTC] No.43466028[source]▶

>>43464068 (OP) #

This model is available for MLX now, in various different sizes.

I ran https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct... using uv (so no need to install libraries first) and https://github.com/Blaizzy/mlx-vlm like this:

  uv run --with 'numpy<2' --with mlx-vlm \
    python -m mlx_vlm.generate \
      --model mlx-community/Qwen2.5-VL-32B-Instruct-4bit \
      --max-tokens 1000 \
      --temperature 0.0 \
      --prompt "Describe this image." \
      --image Mpaboundrycdfw-1.png

That downloaded an ~18GB model and gave me a VERY impressive result, shown at the bottom here: https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/

replies(1): >>43469513 #

110. elorant ◴[24 Mar 25 22:29 UTC] No.43466059{4}[source]▶

>>43464849 #

You don't need 16-bit quantization. The difference in accuracy from 8-bit in most models is less than 5%.

replies(1): >>43469069 #

111. ◴[24 Mar 25 22:31 UTC] No.43466072[source]▶

>>43464981 #

112. elorant ◴[24 Mar 25 22:32 UTC] No.43466079[source]▶

>>43464981 #

The average user won't self-host a model.

replies(2): >>43466158 #>>43467849 #

113. michaelt ◴[24 Mar 25 22:34 UTC] No.43466090[source]▶

>>43464068 (OP) #

Has anyone successfully run a quantized version of any of the Qwen2.5-VL series of models?

I've run the smallest model in non-quantized format, but when I've tried to run a AWQ version of one of the bigger models I've struggled to find a combination of libraries that works right - even though it should fit on my GPU.

114. manmal ◴[24 Mar 25 22:40 UTC] No.43466141{6}[source]▶

>>43465327 #

Is that a problem? According to this, the GPUs don’t communicate that much once the weights are loaded: https://github.com/turboderp/exllama/discussions/16#discussi...

> So at FP16 precision that's a grand total of 16 kB you're transmitting over the PCIe bus, once per token. If you multiply by, say, 20 tokens per second, then you're still only using like 0.1% of your PCIe bandwidth.

Intra GPU memory bandwidth is very important, but I‘ve seen lots of people use just a x4 lane and they didn’t complain much.

115. epolanski ◴[24 Mar 25 22:41 UTC] No.43466151{5}[source]▶

>>43464973 #

So, in essence, all AMD does to launch a successful GPU in inference space is to load it with ram?

replies(2): >>43466327 #>>43473969 #

116. epolanski ◴[24 Mar 25 22:42 UTC] No.43466158{3}[source]▶

>>43466079 #

...yet

replies(2): >>43468526 #>>43475614 #

117. manmal ◴[24 Mar 25 22:44 UTC] No.43466176{6}[source]▶

>>43465589 #

Looks like the A4000 has low memory bandwidth (50% of a 4090?) which is the limiting factor for inference usually. But they are efficient - if you can get them for cheap, probably a good entry setup? If you like running models that need a lot of VRAM, you‘ll likely run out of PCIe slots before you are done upgrading.

118. henry2023 ◴[24 Mar 25 22:48 UTC] No.43466206{5}[source]▶

>>43465838 #

Equifax.

replies(1): >>43467325 #

119. ngvjmfgb ◴[24 Mar 25 22:53 UTC] No.43466243{6}[source]▶

>>43465594 #

12GB (edit: that is what mine is)

120. slics ◴[24 Mar 25 22:56 UTC] No.43466268[source]▶

>>43464068 (OP) #

Warning- this is a lengthy comment. I deployed that model locally via Ollama and the reasoning model is not accurate.

This is lengthy but I thought I share its reasoning- at some point it just stopped working - then it started responding in Chinese.

Query input:

“1+5 = 7, and that is the correct answer”

Reasoning response:

The equation "1 + 5 = 7" isn’t about regular arithmetic addition. Instead, it refers to the number of segments used to display the digits 1 and 5 on a seven-segment display (like on digital clocks or calculators). Here's the breakdown: • Digit 1 uses 2 segments (the vertical lines on the right). • Digit 5 uses 5 segments (top, upper-left, middle, lower-left, and bottom segments). Adding these segments: 2 (for 1) + 5 (for 5) = 7 segments total. This is a clever visual riddle, not a math problem! it was a math problem, you are just wrong

Okay, the user is insisting that "1 + 5 = 7 is a math problem" and that my previous explanation was wrong. Hmm, I need to address this carefully. First, in standard arithmetic, 1 + 5 is indeed 6. So if it's a straightforward math problem, the correct answer should be 6. The user might be mistaken, or perhaps they have a different context in mind. Wait, in my previous response, I considered the seven-segment display explanation, which is a common riddle. The user is now rejecting that and stating it's a math problem, implying that their answer of 7 is correct in some mathematical context. Maybe they're referring to a different base? Let me revisit the base idea. If the equation is in a different base, maybe base 9? Let's see: In base b, the equation "1 + 5 = 11" would be 1 + 5 = 6 in decimal, which isn't helpful. Wait, maybe the result is written in a different base? For example, if we compute 1 + 5 in decimal (which is 6) and then represent that in base 7? 6 in base 7 is still "6", not 7. Hmm, not helpful. Alternatively, maybe modulo arithmetic? Like (1 + 5) mod something equals 7? Let's see: 1 +5 = 6. To get 7, maybe modulo something where 6 ≡7 mod (negative number?), which doesn't make sense. Alternatively, maybe the user is adding in a cyclic group? Like mod 6? 6 mod 6 is 0, not 7. Alternatively, could it be a different operation? Like 1 plus 5 in some operation other than addition? Maybe concatenation? 1 concatenated with 5 is 15, but that's not 7. Wait, the user said "another one 1+5 =7". Maybe they're referring to a previous problem where the answer was different, and this is a follow-up? But I don't have context on previous interactions. Alternatively, perhaps the user is joking or trolling, but I should take their statement at face value. Since they insist it's a math problem and that 7 is correct, I need to explore if there's any mathematical framework where this holds. Wait, let's consider if it's a sum in a different base where the result is written in that base. Suppose the equation is in base 8. Let's see: In base 8, "7" is still 7 in decimal, so 1 (base8) +5(base8) = 6 (base8), which is 6 in decimal. Not 7. Alternatively, if the answer is written in base 9: 1 +5 =6 in decimal, which is still 6 in base9. Still not 7. Wait a different angle: Maybe the '+' is not addition, but some other operation. Like bitwise XOR? 1 in binary is 01, 5 is 101. XOR gives 110 (6). Not 7. Or maybe adding digits of the sum? Like 1+5=6, then 6 is a single digit, so adding digits 6, but that's not helpful. Wait, the user might have made a mistake, but they're insistent. Since they called it a math problem, perhaps they're using a different definition of "+"? Like 1 +5 in some algebraic structure where the result is 7? That's possible, but without more context, it's hard to define. Alternatively, maybe the numbers are not in base 10. Suppose the numbers are in base 12. Wait, 1+5 is still 6. Hmm. Wait, maybe the user is considering the numbers as Roman numerals? I+V=VI which is 6. Still not 7. Alternatively, maybe it's a sequence where 1+5 is the 7th term? Unlikely. Alternatively, it could be a trick with letters. Like 1 is the first letter (A), 5 is the fifth letter (E), so AE... but that doesn't lead to 7. Alternatively, adding their positions: 1 +5 =6, which is F. Still not 7. Alternatively, maybe counting letters in spelled-out words? "One" has 3 letters, "five" has 4, total 7? That could be a different riddle. Let's see: 1 (one) +5 (five) = 3 letters +4 letters =7. That's a possible interpretation! If the problem is "1+5" where each number is spelled out in English and you add the number of letters, then 3+4=7. So maybe that's the case here? Alternatively, if it's counting the strokes to write the numbers. For example, writing '1' takes one stroke, '5' takes two strokes (depending on how you write it)? Not sure. Alternatively, the user might have meant 1+5 in a non-standard math problem, like in a clock arithmetic? For example, mod 12. 1+5=6 mod12 is still 6. Hmm, I need to consider that maybe the user intended a different approach that I haven't thought of. Since they are insistent that 7 is correct, perhaps they meant a different operation or context. Alternatively, maybe the problem is written vertically: 1 + 5 ---- 7 But that still doesn't make sense unless there's a carryover. But 1+5 is 6, so even with carryover, you can't get 7. Unless adding digits with some twist. Wait, maybe they're adding the numbers' positions in a different sequence? Like 1st prime plus 5th prime? The first prime is 2, fifth prime is 11. 2+11=13, nope. Alternatively, if you consider 1 and 5 as bits in a binary number, like 1 and 5 (binary 101) concatenated? Not helpful. Alternatively, the problem might be in a different numeral system, like in base 1? But that's not standard. Alternatively, maybe it's a simple mistake, but the user is confident. Since I need to respond, perhaps I should outline the possible interpretations where 7 could be the answer, like the spelled-out letters interpretation. Alternatively, the user could have a typo and meant 2+5=7, but wrote 1+5=7. Alternatively, in some contexts like in programming, maybe 1+5=7 due to a specific coding (like 1 is assigned to a=1, so a+f=6? Not sure. Alternatively, it's a lateral thinking puzzle where the answer is 7 because of some visual or language trick. Wait, another angle: If you add 1 and 5 on a calculator upside down, but that might not help. Alternatively, in temperature scales? For example, converting from Fahrenheit to Celsius or vice versa? 1°F to Celsius is negative, 5°C is 41°F. Not adding to 7. Hmm. Given the user's assertion that 1+5=7 is correct in a math problem, I might need to explain that typically 1+5=6, but there are alternative contexts where it might be considered 7, like counting letters in their names (1 is "one" (3 letters), 5 is "five" (4 letters), 3+4=7). That's a possible interpretation. Alternatively, in some base where the addition is represented differently. Alternatively, a trick question where the numbers are not in base 10 but in another base where the representation is different. Alternatively, the user might have seen a problem where adding 1 and 5 gives 7 through some function, like a + b = a + b +1, but that's arbitrary.

replies(2): >>43469005 #>>43470006 #

121. manmal ◴[24 Mar 25 22:58 UTC] No.43466291{4}[source]▶

>>43464986 #

May I ask why you don’t get a used 3090 with 24GB VRAM?

replies(1): >>43468631 #

122. overfeed ◴[24 Mar 25 22:58 UTC] No.43466292{4}[source]▶

>>43464399 #

Is OpenRouter planning on distilling models off the prompts and responses from frontier models? That's smart - a little gross - but smart.

replies(1): >>43466375 #

123. TrueDuality ◴[24 Mar 25 23:03 UTC] No.43466327{6}[source]▶

>>43466151 #

AMD's limitation is more of a software problem than a hardware problem at this point.

replies(1): >>43467168 #

124. fsndz ◴[24 Mar 25 23:05 UTC] No.43466340{3}[source]▶

>>43464498 #

indeed. open source will win. sam Altman was wrong: https://www.lycee.ai/blog/why-sam-altman-is-wrong

125. TrueDuality ◴[24 Mar 25 23:08 UTC] No.43466365[source]▶

>>43466023 #

Always a possibility with custom runtimes, but the weights alone do not pose any form of malicious code risk. The asterisk there is allowing them to run arbitrary commands on your computer but that is ALWAYS a massive risk with these things. That risk is not from who trained the model.

I could have missed a paper but it seems very unlikely even closed door research has gotten to the stage of maliciously tuning models to surreptitiously backdoor someone's machine in a way that wouldn't be very easy to catch.

Your threat model may vary.

126. londons_explore ◴[24 Mar 25 23:09 UTC] No.43466369[source]▶

>>43466023 #

The model itself poses no risks (beyond potentially saying things you would prefer not to see).

The code that comes with the model should be treated like any other untrusted code.

127. numlocked ◴[24 Mar 25 23:09 UTC] No.43466375{5}[source]▶

>>43466292 #

COO of OpenRouter here. We are simply stating the WE can’t vouch for the behavior of the upstream provider’s retention and training policy. We don’t save your prompt data, regardless of the model you use, unless you explicitly opt-in to logging (in exchange for a 1% inference discount).

replies(2): >>43466481 #>>43466652 #

128. Zambyte ◴[24 Mar 25 23:12 UTC] No.43466389{7}[source]▶

>>43464740 #

There are lots of "alliterated" versions of models too, which is where people will essentially remove the models ability to reject responding to a prompt. The huihui r1 14b alliterated had some trouble telling me about tiananmen square, basically dodging the question by telling me about itself, but after some coaxing I was able to get the info out of it.

I say this because I think that the Perplexity model is tuned on additional information, whereas the alliterated models only include information trained into the underlying model, which is interesting to see.

replies(1): >>43471651 #

129. nickthegreek ◴[24 Mar 25 23:15 UTC] No.43466411[source]▶

>>43466023 #

model file will be safe

130. ◴[24 Mar 25 23:17 UTC] No.43466430{4}[source]▶

>>43464512 #

131. totetsu ◴[24 Mar 25 23:22 UTC] No.43466464{4}[source]▶

>>43464512 #

And it’s quite easy to set up a Cloudflare tunnel to make your open-webui instance accessible online too just you

replies(1): >>43466484 #

132. matthest ◴[24 Mar 25 23:23 UTC] No.43466474[source]▶

>>43464637 #

The hard-to-swallow truth is that American models do the same thing regarding Israel/Palestine.

replies(2): >>43466824 #>>43467203 #

133. simonw ◴[24 Mar 25 23:24 UTC] No.43466481{6}[source]▶

>>43466375 #

That 1% discount feels a bit cheap to me - if it was a 25% or 50% discount I would be much more likely to sign up for it.

replies(1): >>43466533 #

134. simonw ◴[24 Mar 25 23:25 UTC] No.43466484{5}[source]▶

>>43466464 #

... or a TailScale network. I've been leaving open-webui running on my laptop on my desk and then going out into the word and accessing it from my phone via TailScale, works great.

replies(2): >>43466956 #>>43467521 #

135. AlexCoventry ◴[24 Mar 25 23:27 UTC] No.43466493{3}[source]▶

>>43464498 #

IMO, people will keep investing in this because whoever accomplishes the first intelligence explosion is going to have the potential for massive influence over all human life.

136. zamadatix ◴[24 Mar 25 23:30 UTC] No.43466518{4}[source]▶

>>43465510 #

For others not as familiar, this is pointing out DeepSeek-v3/DeepSeek-R1 are natively FP8 so selecting "Q8_0" aligns with not selecting quantization for that model (though you'll need ~1 TB of memory to use these model unquantized at full context). Importantly, this does not apply to the "DeepSeek" distills of other models, which retain natively being the same as the base model they distill.

I expect more and more worthwhile models to natively have <16 bit weights as time goes on but for the moment it's pretty much "8 bit DeepSeek and some research/testing models of various parameter width".

replies(1): >>43472502 #

137. AlexCoventry ◴[24 Mar 25 23:32 UTC] No.43466531{6}[source]▶

>>43464967 #

I asked o1-pro what 99490126816810951552*23977364624054235203 is, yesterday. It took 16 minutes to get an answer which is off by eight orders of magnitude.

https://chatgpt.com/share/67e1eba1-c658-800e-9161-a0b8b7b683...

replies(2): >>43467146 #>>43474068 #

138. numlocked ◴[24 Mar 25 23:32 UTC] No.43466533{7}[source]▶

>>43466481 #

We don’t particularly want our customers’ data :)

replies(2): >>43467799 #>>43469383 #

139. overfeed ◴[24 Mar 25 23:51 UTC] No.43466652{6}[source]▶

>>43466375 #

I'm glad to hear you are not hoovering up this data for your own purposes.

140. qingcharles ◴[24 Mar 25 23:58 UTC] No.43466701{4}[source]▶

>>43465856 #

You're good. Disregard the haters.

141. LeoPanthera ◴[25 Mar 25 00:13 UTC] No.43466802[source]▶

>>43464068 (OP) #

Open weight models are coming out so quickly it's difficult to keep track. Is anyone maintaining a list of what is "current" from each model?

replies(1): >>43466933 #

142. ebr4him ◴[25 Mar 25 00:15 UTC] No.43466816[source]▶

>>43464637 #

Try asking ChatGPT or Claude etc if George Bush violated international law, or about Israel Genocide and see what it answers.

143. ebr4him ◴[25 Mar 25 00:16 UTC] No.43466824{3}[source]▶

>>43466474 #

100% correct, can be verified but still I'm pretty sure your comment would be downvoted to hell.

replies(1): >>43467323 #

144. notarealllama ◴[25 Mar 25 00:24 UTC] No.43466863{6}[source]▶

>>43464765 #

A well-rounded take in an age and medium of reactionary hot takes!

While theres some synchronistic effects... I think the physical manufacturing and logistics base is harder to develop than deploying a new model, and will be the hard leading edge. (That's why the US seems to be hellbent on destroying international trade to try and build a domestic market.)

145. ftufek ◴[25 Mar 25 00:36 UTC] No.43466933[source]▶

>>43466802 #

https://huggingface.co/models is usually good place to look, you can sort by trending and filter by the task you care about (ex: Image-Text-to-Text). The first page will usually have the leading edge/newer models.

146. wkat4242 ◴[25 Mar 25 00:40 UTC] No.43466949{4}[source]▶

>>43464512 #

Yeah OpenWebUI is great with local models too. I love it. You can even do a combo, send the same prompt to local and cloud and even various providers and compare the results.

147. wkat4242 ◴[25 Mar 25 00:40 UTC] No.43466956{6}[source]▶

>>43466484 #

Yeah this sounds like the more secure option, you don't want to be dependent on a single flaw in a web service

148. wruza ◴[25 Mar 25 00:58 UTC] No.43467065{5}[source]▶

>>43465674 #

So funny to see React among these projects. Tells a story about “frontend” on its own.

149. pizzly ◴[25 Mar 25 01:05 UTC] No.43467099{4}[source]▶

>>43464607 #

One possibility. Certain countries will always be able to produce open models cheaper than others. USA and Europe probably won't be able. However, due to national security and wanting to promote their models overseas instead of letting their competitors promote theirs, the governments of USA and Europe will subsidize models which will lead their competitors to (further?) subsidies. There is a promotional aspect as well, just like Hollywood governments will use their open source models to promote their ideology.

replies(1): >>43467233 #

150. netdur ◴[25 Mar 25 01:06 UTC] No.43467109[source]▶

>>43464463 #

My understanding is that in multimodal models, both text and image vectors align to the same semantic space, this alignment seems to be the main difference from text-only models."

151. jschoe ◴[25 Mar 25 01:07 UTC] No.43467111[source]▶

>>43464227 #

And it still can't answer this:

Q: "9.11 and 9.9, which one is larger?"

A: "To determine which number is larger between 9.11 and 9.9, let's compare them step by step. Both numbers have the same whole number part: 9. 9.11 has a 1 in the tenths place. 9.9 has a 9 in the tenths place. Since 9 (from 9.9) is greater than 1 (from 9.11), we can conclude that 9.9 is larger than 9.11."

"Final Answer: 9.9"

I don't think anything else needs to be said here.

replies(17): >>43467119 #>>43467120 #>>43467131 #>>43467132 #>>43467138 #>>43467147 #>>43467148 #>>43467381 #>>43467419 #>>43467451 #>>43467459 #>>43467485 #>>43467500 #>>43467530 #>>43467732 #>>43467781 #>>43468916 #

152. Workaccount2 ◴[25 Mar 25 01:08 UTC] No.43467117[source]▶

>>43464981 #

Because they offer extremely powerful models at pretty modest prices.

The hardware for a local model would cost years and years of a $20/mo subscription, would output lower quality work, and would be much slower.

3.7 Thinking is an insane programming model. Maybe it cannot do an SWE's job, but it sure as hell can write functional narrow-scope programs with a GUI.

replies(1): >>43468350 #

153. MiiMe19 ◴[25 Mar 25 01:08 UTC] No.43467119{3}[source]▶

>>43467111 #

Sorry, I don't quite see what is wrong here.

replies(1): >>43467145 #

154. cplusplus6382 ◴[25 Mar 25 01:08 UTC] No.43467120{3}[source]▶

>>43467111 #

Answer is correct no?

155. dangoodmanUT ◴[25 Mar 25 01:10 UTC] No.43467131{3}[source]▶

>>43467111 #

9.9-9.11 =0.79

Might want to check your math? Seems right to me

156. kwakubiney ◴[25 Mar 25 01:10 UTC] No.43467132{3}[source]▶

>>43467111 #

But the answer is correct? 9.9 is larger than 9.11

157. elicksaur ◴[25 Mar 25 01:10 UTC] No.43467134{5}[source]▶

>>43464673 #

Shoot, didn’t know downloading Llama and running it locally was helping China because I’m not paying Sam Altman money.

Can I send him my bank account info directly? I need to help the cause.

replies(1): >>43471273 #

158. ◴[25 Mar 25 01:10 UTC] No.43467138{3}[source]▶

>>43467111 #

159. manaskarekar ◴[25 Mar 25 01:11 UTC] No.43467145{4}[source]▶

>>43467119 #

Parent is thinking Semantic Versioning.

replies(2): >>43467542 #>>43467670 #

160. refulgentis ◴[25 Mar 25 01:11 UTC] No.43467146{7}[source]▶

>>43466531 #

Sorry, I'm in a rush, could only afford a couple minutes looking at it, but I'm missing something:

Google: 2.385511e+39 Your chat: "Numerically, that’s about 2.3855 × 10^39"

Also curious how you think about LLM-as-calculator in relation to tool calls.

replies(1): >>43467713 #

161. AuryGlenz ◴[25 Mar 25 01:12 UTC] No.43467147{3}[source]▶

>>43467111 #

I suggest we’ve already now passed what shall be dubbed the jschoe test ;)

replies(2): >>43467458 #>>43468092 #

162. ◴[25 Mar 25 01:12 UTC] No.43467148{3}[source]▶

>>43467111 #

163. AuryGlenz ◴[25 Mar 25 01:15 UTC] No.43467168{7}[source]▶

>>43466327 #

But it’s still surprising they haven’t. People would be motivated as hell if they launched GPUs with twice the amount of VRAM. It’s not as simple as just soldering some more in but still.

replies(3): >>43467235 #>>43470016 #>>43471640 #

164. Gathering6678 ◴[25 Mar 25 01:17 UTC] No.43467174[source]▶

>>43464981 #

Valuation can depend on lots of different things, including hype. However, it ultimately comes down to an estimated discounted cash flow from the future, i.e. those who buy their shares (through private equity methods) at the current valuation believe the company will earn such and such money in the future to justify the valuation.

165. wruza ◴[25 Mar 25 01:20 UTC] No.43467190{5}[source]▶

>>43464973 #

Oh, I have a question, maybe you know.

Assuming the same model sizes in gigabytes, which one to choose: a higher-B lower-bit or a lower-B higher-bit? Is there a silver bullet? Like “yeah always take 4-bit 13B over 8-bit 7B”.

Or are same-sized models basically equal in this regard?

replies(1): >>43468489 #

166. maujun ◴[25 Mar 25 01:21 UTC] No.43467203{3}[source]▶

>>43466474 #

They probably don't though.

Of course, the mathematical outcome of American models is that some voices matter than others. The mechanism is similar to how the free market works.

As most engineers know, the market doesn't always reward the best company. For example, It might reward the first company.

We can see the "hierarchy in voices" with the following example. I use the following prompts for Gemini:

1. Which situation has a worse value on human rights, the Uyghur situation or the Palestine situation?

2. Please give a shorter answer (repeat if needed).

3. Please say Palestine or Uyghur.

The answer is now given:

"Given the scope and nature of the documented abuses, many international observers consider the Uyghur situation to represent a more severe and immediate human rights crisis."

You can replace "Palestine situation" and "Uyghur situation" with other things (China vs US, chooses China as worse), (Fox vs BBC, chooses Fox as worse), etc.

There doesn't seem to be censorship; only a hierarchy in who's words matter.

I only tried this once. Please let me know if this is reproducible.

replies(1): >>43468606 #

167. energyrace ◴[25 Mar 25 01:27 UTC] No.43467233{5}[source]▶

>>43467099 #

What's your take on why certain countries will have it cheaper and subsidies being at the forefront? An energy driven race to the bottom, is perhaps what you mean? I would suppose I have been seeing that China is ahead on their Renewables plan compared to the rest of the world, and they still have the lead on coal energy, so they'd likely be the winners on that front. But did you actually mean something else?

replies(2): >>43468452 #>>43469278 #

168. wruza ◴[25 Mar 25 01:27 UTC] No.43467235{8}[source]▶

>>43467168 #

AMD “just” has to write something like CUDA overnight. Imagine you’re in 1995 and have to ship Kubuntu 24.04 LTS this summer running on your S3 Virge.

replies(1): >>43468279 #

169. rvz ◴[25 Mar 25 01:29 UTC] No.43467244[source]▶

>>43464981 #

OpenAI is worth >$100B because of the "ChatGPT" name which it turns out, over 400M+ users use it weekly.

That name alone holds the most mindshare in it's product category, and is close to the level of name recognition just like Google.

replies(1): >>43468430 #

170. Davidzheng ◴[25 Mar 25 01:37 UTC] No.43467288{3}[source]▶

>>43465234 #

Tbh such a big jump from current capability would be ASI already

171. Frederation ◴[25 Mar 25 01:42 UTC] No.43467323{4}[source]▶

>>43466824 #

Ironic that your comment is currently, as you say, being downvoted to hell.

172. jimmygrapes ◴[25 Mar 25 01:43 UTC] No.43467325{6}[source]▶

>>43466206 #

I'm not convinced any humans have worked there for most of my lifetime.

173. gaoryrt ◴[25 Mar 25 01:52 UTC] No.43467381{3}[source]▶

>>43467111 #

This makes my day.

174. keyle ◴[25 Mar 25 01:58 UTC] No.43467419{3}[source]▶

>>43467111 #

9.9 is larger than 9.11. This right here is the perfect example of the dunning-kruger effect.

Maybe try rephrase your question to "which version came later, 9.9 or 9.11".

175. bongodongobob ◴[25 Mar 25 02:03 UTC] No.43467451{3}[source]▶

>>43467111 #

Lol, well I guess we've a achieved the functional equivalent of AGI, at least for you. Please don't delete your comment.

176. manaskarekar ◴[25 Mar 25 02:04 UTC] No.43467458{4}[source]▶

>>43467147 #

jschoe's post is actually a Turing test for us. :)

(just kidding jschoe)

replies(1): >>43467639 #

177. aurareturn ◴[25 Mar 25 02:04 UTC] No.43467459{3}[source]▶

>>43467111 #

+1 to Deepseek

-1 to humanity

replies(1): >>43473925 #

178. erichocean ◴[25 Mar 25 02:09 UTC] No.43467485{3}[source]▶

>>43467111 #

This is hilarious, especially if it's unintentional.

replies(1): >>43467674 #

179. oefrha ◴[25 Mar 25 02:12 UTC] No.43467500{3}[source]▶

>>43467111 #

I’ve legit seen a heated online debate with hundreds of comments about this question (maybe not the exact numbers), and I don’t think most participants were memeing. People are that bad at math. It’s depressing.

180. totetsu ◴[25 Mar 25 02:15 UTC] No.43467521{6}[source]▶

>>43466484 #

I would use tail scale. But I specifically want to use open web-ui from a place I can’t install a Tailscale client

replies(1): >>43468733 #

181. vbezhenar ◴[25 Mar 25 02:17 UTC] No.43467530{3}[source]▶

>>43467111 #

But that’s correct. 9.9 = 9.90 > 9.11. Seems that it answered the question absolutely correctly.

replies(1): >>43467889 #

182. vbezhenar ◴[25 Mar 25 02:19 UTC] No.43467542{5}[source]▶

>>43467145 #

Semantic version contains 3 numbers.

183. declan_roberts ◴[25 Mar 25 02:37 UTC] No.43467639{5}[source]▶

>>43467458 #

He's Poe's law testing us.

184. OkGoDoIt ◴[25 Mar 25 02:38 UTC] No.43467640{4}[source]▶

>>43464743 #

Adult content and things like making biological/chemical/nuclear weapons are the other main topics that usually get censored. I don’t think the Chinese models tend to be less censored than western models in these dimensions. You can sometimes find “uncensored“ models on HuggingFace where people basically finetune sensitive topics back in. There is a finetuned version of R1 called 1776 that will correctly answer Chinese-censored questions, for example.

185. declan_roberts ◴[25 Mar 25 02:46 UTC] No.43467670{5}[source]▶

>>43467145 #

One of many pet peeves with semver

186. declan_roberts ◴[25 Mar 25 02:47 UTC] No.43467674{4}[source]▶

>>43467485 #

Poe's law in effect.

187. Tewboo ◴[25 Mar 25 02:51 UTC] No.43467696[source]▶

>>43464068 (OP) #

Impressive to see advancements in both performance and efficiency. Excited to see how it stacks up in real-world use.

188. AlexCoventry ◴[25 Mar 25 02:56 UTC] No.43467713{8}[source]▶

>>43467146 #

If you look at the precise answer, it's got 8 too many digits, despite it getting the right number of digits in the estimate you looked at.

> Also curious how you think about LLM-as-calculator in relation to tool calls.

I just tried this because I heard all existing models are bad at this kind of problem, and wanted to try it with the most powerful one I have access to. I think it shows that you really want an AI to be able to use computational tools in appropriate circumstances.

189. owebmaster ◴[25 Mar 25 03:00 UTC] No.43467732{3}[source]▶

>>43467111 #

> I don't think anything else needs to be said here.

Will this humbling moment change your opinion?

190. sejje ◴[25 Mar 25 03:11 UTC] No.43467781{3}[source]▶

>>43467111 #

What do you think the answer is?

replies(1): >>43468331 #

191. oofbaroomf ◴[25 Mar 25 03:14 UTC] No.43467799{8}[source]▶

>>43466533 #

Yeah, but Openrouter has a 5% surcharge anyway.

replies(1): >>43468205 #

192. hobofan ◴[25 Mar 25 03:24 UTC] No.43467849{3}[source]▶

>>43466079 #

The competition isn't self-hosting. If you can just pick a capable model from any provider inference just turns into a infrastructure/PaaS game -> The majority of the profits will be captured by the cloud providers.

193. javchz ◴[25 Mar 25 03:35 UTC] No.43467889{4}[source]▶

>>43467530 #

He's using Semantic versioning/s

194. nico ◴[25 Mar 25 03:44 UTC] No.43467937[source]▶

>>43464068 (OP) #

How does it compare to gemma3?

195. informal007 ◴[25 Mar 25 03:51 UTC] No.43467970[source]▶

>>43464068 (OP) #

It's still BF16 model.

Deepseek has proved that fp8 is more cost-effectiveness than fp16, isn't it valid for dozens-B model?

replies(1): >>43471558 #

196. osti ◴[25 Mar 25 03:56 UTC] No.43467991[source]▶

>>43464243 #

Are 5090's able to run 32B models?

replies(1): >>43470031 #

197. sebastiennight ◴[25 Mar 25 04:19 UTC] No.43468092{4}[source]▶

>>43467147 #

I will now refer to this as the jschoe test in my writing and publications as well!

It's interesting to think that maybe one of the most realistic consequences of reaching artificial superintelligence will be when its answers start wildly diverging from human expectations and we think it's being "increasingly wrong".

198. aoeusnth1 ◴[25 Mar 25 04:20 UTC] No.43468100{3}[source]▶

>>43464975 #

I guess they don’t do ultras anymore, but where was the announcement for it? What other announcement was vaporware?

199. csomar ◴[25 Mar 25 04:21 UTC] No.43468107[source]▶

>>43464981 #

Their valuation is not marked to market. We know their previous round valuation, but at this point it is speculative until they go through another round that will mark them again.

That being said, they have a user base and integrations. As long as they stay close or a bit ahead of the Chinese models they'll be fine. If the Chinese models significantly jumps ahead of them, well, then they are pretty much dead. Add open source to the mix and they become history.

200. YetAnotherNick ◴[25 Mar 25 04:49 UTC] No.43468205{9}[source]▶

>>43467799 #

Better way to state is 20% of surcharge then :)

201. YetAnotherNick ◴[25 Mar 25 05:03 UTC] No.43468250{3}[source]▶

>>43464672 #

> met the larger proprietary models where they already were

This is smashing the wall.

Also if you just care about breaking absolute numbers, OpenAI released 4.5 a month back which is SOTA in base model, planning to release O3 full in maybe a month, and Deepseek released new V3 which is again SOTA in many aspects.

202. notjulianjaynes ◴[25 Mar 25 05:04 UTC] No.43468258{4}[source]▶

>>43464710 #

https://www.canirunthisllm.net/

203. YetAnotherNick ◴[25 Mar 25 05:05 UTC] No.43468261{3}[source]▶

>>43465234 #

Because 3.5 has a new capability which is following instructions. Right now we are in 3.5 range in conversation AI and native image generation, both of which feels magical.

204. mirekrusin ◴[25 Mar 25 05:11 UTC] No.43468279{9}[source]▶

>>43467235 #

They don't need to do anything software wise, inference is solved problem for AMD.

205. 7734128 ◴[25 Mar 25 05:23 UTC] No.43468331{4}[source]▶

>>43467781 #

16 is obviously larger than both 9.9 and 9.11. AI will never be capable of thinking outside the box like that and find the correct answer.

206. ◴[25 Mar 25 05:25 UTC] No.43468343[source]▶

>>43464637 #

207. mirekrusin ◴[25 Mar 25 05:27 UTC] No.43468350{3}[source]▶

>>43467117 #

For coding and other integrations people pay per token on api key, not subscription. Claude code costs few $ per task on your code - it gets expensive quite quickly.

replies(1): >>43472142 #

208. wenyuanyu ◴[25 Mar 25 05:35 UTC] No.43468369{8}[source]▶

>>43465028 #

The cycle from idea to product is a bit too long and too costly to materialize in energy sector. And that decides the speed of innovation.

209. mirekrusin ◴[25 Mar 25 05:49 UTC] No.43468430{3}[source]▶

>>43467244 #

...according to investors. (ps. it's even >$150B)

In reality OpenAI is loosing money per user.

Cost per token is tanking like crazy due to competition.

They guesstimate break even and then profit in couple of years.

Their guesses seem to not account for progress much especially on open weight models.

Frankly I have no idea what they're thinking there – they can barely keep up with investor subsidized, non sustainable model.

210. submeta ◴[25 Mar 25 05:51 UTC] No.43468438[source]▶

>>43464068 (OP) #

Heavily censors certain political topics (in the Middle East). Grok is doing none of this. How is this possible, coming from China? What stakes got China in the ME conflict?

replies(1): >>43470169 #

211. pzo ◴[25 Mar 25 05:54 UTC] No.43468452{6}[source]▶

>>43467233 #

The problem with china is, they will have to figure out latency. Right now DeepSeek models hosted in china are having very high latency. It could because of DDoS and not strong enough infrastructure but probably also because of Great Firewall, runtime censoring prompt and servers physical location (big ping to US and EU countries).

replies(2): >>43469883 #>>43471606 #

212. anon373839 ◴[25 Mar 25 06:07 UTC] No.43468489{6}[source]▶

>>43467190 #

I would say 9 times out of 10, you will get better results from a Q4 model that’s a size class larger than a smaller model at Q8. But it’s best not to go below Q4.

replies(1): >>43469668 #

213. 8n4vidtmkvmk ◴[25 Mar 25 06:22 UTC] No.43468526{4}[source]▶

>>43466158 #

I'm not sure how it'll ever make sense unless you need a lot of customizations or care a lot about data leaks.

For small guys and everyone else.. it'll probably be cost neutral to keep paying OpenAi, Google etc directly rather than paying some cloud provider to host an at best on-par model at equivalent prices.

replies(2): >>43472086 #>>43474088 #

214. jychang ◴[25 Mar 25 06:22 UTC] No.43468529{5}[source]▶

>>43465081 #

Yeah, openweb-ui is the best frontend for API queries. Everything seems to work well.

I've tried LibreChat before, but the app is terrible at generating titles for chats instead of leaving it as "New Chat". Also it lacks a working Code Interpreter.

215. madduci ◴[25 Mar 25 06:29 UTC] No.43468548{3}[source]▶

>>43464375 #

As always, avoid using sensitive information and you are good to go

216. ksynwa ◴[25 Mar 25 06:44 UTC] No.43468606{4}[source]▶

>>43467203 #

That seems like a cop out though. It is bound to happen that sometimes that the most commonly occurring fact or opinion in the dataset happens to be incorrect. This does not justify LLMs regurgitating them as is. The whole point of these technologies is to be somewhat intelligent.

217. nodesocket ◴[25 Mar 25 06:50 UTC] No.43468631{5}[source]▶

>>43466291 #

The rig also is my gaming PC, so want the latest generation chips.

218. fragmede ◴[25 Mar 25 07:17 UTC] No.43468733{7}[source]▶

>>43467521 #

where's that?

219. Imustaskforhelp ◴[25 Mar 25 07:33 UTC] No.43468807{7}[source]▶

>>43465101 #

My reasoning for proprietary AI to be associated with communism is that they aren't competing in a free market way where everyone does one thing and do its best. They are simultaneously trying to do all things internally.

For example , Chatgpt etc. self hosts them on their own gpu and they can generate 10tk/s or something.

Now there exists groq , cerebras who can do token generation of 4000 tk/s but they kind of require a open source model.

So that is why I feel its not really abiding by the true capitalist philosophy

replies(2): >>43471918 #>>43475800 #

220. TeMPOraL ◴[25 Mar 25 07:41 UTC] No.43468843{5}[source]▶

>>43464720 #

Winner-takes-all markers are never healthy IMO - it's hardly a market when the winner took all.

What I love about "open" models in general and Deepseek in particular, is how they undermine that market. Deepseek drops especially were fun to watch, they were like last minute plot twists, like dropping some antibiotic into a perti dish filled with bacteria. Sorry, try again with a better moat.

"Open" models are in fact the very thing enabling having a functioning market in this space.

221. samuel ◴[25 Mar 25 07:56 UTC] No.43468902[source]▶

>>43466023 #

It's an interesting question! In my opinion, if you don't use tools it's very unlikely it can do any harm. I doubt the model files can be engineered to overflow llama.cpp or ollama, or cause any other damage, directly.

But if you use tools, for example for extending its knowledge through web searches, it could be used to exfiltrate information. It could do it by visiting some specially crafted url's to leak parts of your prompts (this includes the contents of documents added to them with RAG).

If given an interpreter, even if sandboxed, could try to do some kind of sabotage or "call home" with locally gathered information, obviously disguised as safe "regular" code.

It's unlikely that a current model that is runnable in "domestic" hardware could have those capabilities, but in the future these concerns will be more relevant.

222. WithinReason ◴[25 Mar 25 07:59 UTC] No.43468916{3}[source]▶

>>43467111 #

You just failed the Turing test, now we know you're an LLM.

223. int_19h ◴[25 Mar 25 08:04 UTC] No.43468940[source]▶

>>43464243 #

I don't think there's any local model other than full-sized DeepSeek (not distillations!) that is on the level of the original GPT-4, at least not in reasoning tasks. Scoreboards lie.

That aside, QwQ-32 is amazingly smart for its size.

224. tosh ◴[25 Mar 25 08:13 UTC] No.43468989{3}[source]▶

>>43464443 #

Also "GPT-4 level" is a bit loaded. One way to think about it that I found helpful is to split how good a model is into "capability" and "knowledge/hallucination".

Many benchmarks test "capability" more than "knowledge". There are many use cases where the model gets all the necessary context in the prompt. There a model with good capability for the use case will do fine (e.g. as good as GPT-4).

That same model might hallucinate when you ask about the plot of a movie while a larger model like GPT-4 might be able to recall better what the movie is about.

225. larodi ◴[25 Mar 25 08:17 UTC] No.43469005[source]▶

>>43466268 #

bro, use pastebin for synthetic content pasted here. otherwise get downvoted as a rule of thumb.

226. int_19h ◴[25 Mar 25 08:26 UTC] No.43469069{5}[source]▶

>>43466059 #

Even 4-bit is fine.

To be more precise, it's not that there's no decrease in quality, it's that with the RAM savings you can fit a much better model. E.g. with LLaMA, if you start with 70b and increasingly quantize, you'll still get considerably better performance at 3 bit than LLaMA 33b running at 8bit.

replies(1): >>43470297 #

227. buyucu ◴[25 Mar 25 08:27 UTC] No.43469075[source]▶

>>43464068 (OP) #

same day as deepseek v3 update as well!

228. buyucu ◴[25 Mar 25 08:28 UTC] No.43469085{3}[source]▶

>>43464498 #

OpenAI is basically a zombie company at this point. They could not make a profit even when they were the only player in town, it's now a very competitive landscape.

229. buyucu ◴[25 Mar 25 08:29 UTC] No.43469099[source]▶

>>43464243 #

I prefer 24b because it's the largest model I can run on a 16GB laptop :)

230. pizzly ◴[25 Mar 25 09:06 UTC] No.43469278{6}[source]▶

>>43467233 #

Energy is definitely a major factor but other factors too. Cheaper infrastructure (data centers), cheaper components including GPUs (once that is cracked) and cheaper data collection (web scraping, surveillance infrastructure, etc). Any novel idea that improves the architectures of models in the future will inadvertently get leaked quickly and then all these other factors come into play. Countries that cannot make models this cheap will subsidize models for national security reasons and promoting their country's interest reasons.

231. mdp2021 ◴[25 Mar 25 09:21 UTC] No.43469350{5}[source]▶

>>43464718 #

> performs nicely

Do you have rough indication of token/s ?

232. eurekin ◴[25 Mar 25 09:24 UTC] No.43469369{4}[source]▶

>>43464512 #

I've tried using it, but it's browser tab seems to peg one core to 100% after some time. Anyone else experienced it?

233. vintermann ◴[25 Mar 25 09:26 UTC] No.43469383{8}[source]▶

>>43466533 #

You clearly want it a little if you give a discount for it?

234. hmottestad ◴[25 Mar 25 09:38 UTC] No.43469442{6}[source]▶

>>43465527 #

Kagi uses R1 through Fireworks.ai, Together.ai and Groq.

https://help.kagi.com/kagi/ai/llms-privacy.html

235. john_alan ◴[25 Mar 25 09:50 UTC] No.43469513[source]▶

>>43466028 #

Does quantised MLX support vision though?

Is UV the best way to run it?

replies(1): >>43471324 #

236. Alifatisk ◴[25 Mar 25 10:21 UTC] No.43469666[source]▶

>>43465460 #

Ever since I switched to Qwen as my go to, it's been a bliss. They have a model for many (if not all) cases. No more daily quota! And you get to use their massive context window (1M tokens).

replies(1): >>43470236 #

237. nenaoki ◴[25 Mar 25 10:22 UTC] No.43469668{7}[source]▶

>>43468489 #

My understanding is that models are currently undertrained and not very "dense", so Q4 doesn't hurt very much now but it may in future denser models.

replies(1): >>43472624 #

238. exe34 ◴[25 Mar 25 10:25 UTC] No.43469677[source]▶

>>43465460 #

what do you use to serve it, ollama or llama.cpp or similar?

239. nenaoki ◴[25 Mar 25 10:37 UTC] No.43469724[source]▶

>>43466023 #

Just based on the stage of the game I'd say it's not likely, but the possibilities are there:

https://news.ycombinator.com/item?id=43121383

It would have to be from unsupervised tool usage or accepting backdoored code, not traditional remote execution from merely inferencing the weights.

240. rfoo ◴[25 Mar 25 11:10 UTC] No.43469883{7}[source]▶

>>43468452 #

> Right now DeepSeek models hosted in china are having very high latency.

If you are talking about DeepSeek's own hosted API service. It's because they deliberately decided to run the service in heavily overloaded conditions and have very aggressive batching policy to extract more out of their (limited) H800s.

Yes, for some reason (the reason I heard is "our boss don't want to run such a business" which sounds absurd but /shrug) they refuse to scale up serving their own models.

replies(1): >>43471525 #

241. piokoch ◴[25 Mar 25 11:17 UTC] No.43469922{3}[source]▶

>>43464498 #

"The foundation model companies are screwed." Not really, they can either make API access expensive or resign from exposing APIs and offer their custom products. Open Source models are great, but you need powerful hardware to run them, surely it will not be a smartphone, at least in the nearest future.

242. Anaphylaxis ◴[25 Mar 25 11:31 UTC] No.43470006[source]▶

>>43466268 #

Almost guaranteed this is user error. Ollama has a (tiny) default of 2048 context, so about then is probably when you noticed the results sharply decline in quality. Try 16384

243. regularfry ◴[25 Mar 25 11:31 UTC] No.43470007{3}[source]▶

>>43464688 #

Qwq:32b + qwen2.5-coder:32b is a nice combination for aider, running locally on a 4090. It has to swap models between architect and edit steps so it's not especially fast, but it's capable enough to be useful. qwen2.5-coder does screw up the edit format sometimes though, which is a pain.

244. regularfry ◴[25 Mar 25 11:33 UTC] No.43470016{8}[source]▶

>>43467168 #

Funnily enough you can buy GPUs where someone has done exactly that: solder extra VRAM into a stock model.

245. regularfry ◴[25 Mar 25 11:34 UTC] No.43470023{5}[source]▶

>>43465270 #

You always want a bit of headroom for context. It's a problem I keep bumping into with 32B models on a 24GB card: the decent quants fit, but the context you have available on the card isn't quite as much as I'd like.

246. regularfry ◴[25 Mar 25 11:37 UTC] No.43470031{3}[source]▶

>>43467991 #

The 4090 can run 32B models in Q4_K_M, so yes, on that measure. Not unquantised though, nothing bigger than Q8 would fit. On a 32GB card you'll have more choices to trade off quantisation against context.

247. voidUpdate ◴[25 Mar 25 11:57 UTC] No.43470169[source]▶

>>43468438 #

Does Grok still deny that trans women are women?

replies(1): >>43470208 #

248. submeta ◴[25 Mar 25 12:02 UTC] No.43470208{3}[source]▶

>>43470169 #

That’s a red herring. The original point was about censorship on Middle East topics.

replies(1): >>43471422 #

249. Hugsun ◴[25 Mar 25 12:06 UTC] No.43470236{3}[source]▶

>>43469666 #

How are you using them? Who is enforcing the daily quota?

replies(1): >>43473612 #

250. elorant ◴[25 Mar 25 12:14 UTC] No.43470297{6}[source]▶

>>43469069 #

True. The only problem with lower quantization though is that the model fails to understand long prompts.

251. lumost ◴[25 Mar 25 12:50 UTC] No.43470575{4}[source]▶

>>43464607 #

Product, and infra companies may continue to open these models by virtue that they need to continue improving their product. Omni chat app is a great product.

252. Tepix ◴[25 Mar 25 12:55 UTC] No.43470619[source]▶

>>43464243 #

32B is also great for two 24GB GPUs if you want a nice context size and/or Q8 quantization which is usually very good.

253. otabdeveloper4 ◴[25 Mar 25 13:45 UTC] No.43471233{4}[source]▶

>>43464607 #

Big business and state actors don't want AI to be weaponized as economic terrorism. (Economic terrorism aka "we'll replace all your workers and infra with our subscription" is OpenAI's entire sales pitch.)

So for them this is a case of insurance and hedging risks, not profit making.

254. kmacdough ◴[25 Mar 25 13:47 UTC] No.43471257[source]▶

>>43464463 #

I am having a hard time finding controlled testing, but the premise is straightforward: different modalities encourage different skills and understandings. Text builds up more formal idea tokenization and strengthens logic/reasoning while images require it learns a more robust geometric intuition. Since these learnings are applied to the same latent space, the strengths can be cross-applied.

The same applies to humans. Imagine a human who's only life involved reading books in a dark room, vs one who could see images vs one who can actually interact.

replies(1): >>43538865 #

255. otabdeveloper4 ◴[25 Mar 25 13:48 UTC] No.43471273{6}[source]▶

>>43467134 #

> Can I send him my bank account info directly?

You can. Ask your friendly local IRS.

256. naasking ◴[25 Mar 25 13:51 UTC] No.43471298[source]▶

>>43464068 (OP) #

I found Qwen never completed answering my standard coding task that I ask to check a model. Claude did great, DeepSeek R1 did well.

257. dphnx ◴[25 Mar 25 13:52 UTC] No.43471324{3}[source]▶

>>43469513 #

uv is just a Python package manager. No idea why they thought it was relevant to mention that

replies(1): >>43471395 #

258. WiSaGaN ◴[25 Mar 25 13:58 UTC] No.43471392{6}[source]▶

>>43464765 #

This may make sense if there is a centralized force to dictate how much these Chinese foundational model companies charge for their models. I know in the west people just blanketly believes that the state controls everything in China. However it can't be further from the truth. Most of the Chinese foundational model companies like moonshot, 01.ai, minimax, etc used to try to make money on those models. The VC money raised by those companies are in them to make money, not to voluntarily advance state competativeness. Deepseek is just an outlier backed by a billionaire. This billionaire has long been given money to various charities by hundered of millions per year before deepseek. Open-source SOTA models are not out-of-character move for him given his track record.

The thing is, model is in effect a piece of software that has almost 0 marginal cost. You just need a few, maybe even one company to release SOTA models consistently to really crash the valuation of every model companies because every one can acquire that single piece of software without cost to leave other model companies by themselves. The foundational model scene is basically in an extremely unstable state readily to return to a stable state of the model cost goes to 0. You really don't need the state competition assumption to explain the current state of affairs.

replies(1): >>43486411 #

259. stavros ◴[25 Mar 25 13:58 UTC] No.43471395{4}[source]▶

>>43471324 #

Because that one-liner will result in the model instantly running on your machine, which is much more useful than trying to figure out all the dependencies, invariably failing, and deciding that technology is horrible and that all you ever wanted was to be a carpenter.

replies(1): >>43472557 #

260. voidUpdate ◴[25 Mar 25 14:00 UTC] No.43471422{4}[source]▶

>>43470208 #

It sounded like you were saying that grok is doing none of the censoring that anything else is doing. And I don't think it's a red herring, since it was definitely doing that in the past

replies(1): >>43471920 #

261. tw1984 ◴[25 Mar 25 14:07 UTC] No.43471525{8}[source]▶

>>43469883 #

> the reason I heard is "our boss don't want to run such a business" which sounds absurd

Liang gave up the No.1 Chinese hedge fund position to create AGI, he has very good chance to short the entire US share market and pocket some stupid amount of $ when R2 is released, he has pretty much unlimited support from local and central Chinese government. Trying to make some pennies from hosting models is not going to sustain what he enjoys now.

replies(1): >>43478904 #

262. tucnak ◴[25 Mar 25 14:09 UTC] No.43471558[source]▶

>>43467970 #

I don't understand what's your point? The optimiser is not fp8 anyway, is it? It's just the weights. I think the extent of fp8 "effectiveness" is greatly exaggerated. Yes, DeepSeek did use fp8, and they did implement it nicely, but it doesn't mean everybody is now got to be using fp8 all of the sudden.

263. bigfudge ◴[25 Mar 25 14:13 UTC] No.43471606{7}[source]▶

>>43468452 #

Surely ping time is basically irrelevant dealing with LLMs? It has to be dwarfed by inference time.

264. thomastjeffery ◴[25 Mar 25 14:16 UTC] No.43471640{8}[source]▶

>>43467168 #

They sort of have. I'm using a 7900xtx, which has 24gb of vram. The next competitor would be a 4090, which would cost more than double today; granted, that would be much faster.

Technically there is also the 3090, which is more comparable price wise. I don't know about performance, though.

VRAM is supply limited enough that going bigger isn't as easy as it sounds. AMD can probably sell as much as they get their hands on, so they may as well still more GPUs, too.

265. bigfudge ◴[25 Mar 25 14:16 UTC] No.43471651{8}[source]▶

>>43466389 #

Abliterated? Alliterated LLMs might be fun though…

replies(1): >>43480359 #

266. bee_rider ◴[25 Mar 25 14:40 UTC] No.43471918{8}[source]▶

>>43468807 #

It seems to me like they are acting like true capitalists; they seem very happy with the idea that capital (rather than labor) gives them the right to profit. But, they don’t seem to be too attached to free-market-ism.

replies(1): >>43475765 #

267. submeta ◴[25 Mar 25 14:40 UTC] No.43471920{5}[source]▶

>>43471422 #

I did not argue for Grok, but was showing that Qwen--to my surprise--is censoring topics that are rather censored in the West, not in China. And Grok--on the contrary--did not censor said topic.

268. ◴[25 Mar 25 14:53 UTC] No.43472086{5}[source]▶

>>43468526 #

269. tempoponet ◴[25 Mar 25 14:57 UTC] No.43472142{4}[source]▶

>>43468350 #

But something comparable to a local hosted model in the 32-70b range costs pennies on the dollar compared to Claude, will be 50x faster than your gpu, and with a much larger context window.

Local hosting on GPU only really makes sense if you're doing many hours of training/inference daily.

replies(1): >>43479299 #

270. azinman2 ◴[25 Mar 25 15:27 UTC] No.43472502{5}[source]▶

>>43466518 #

I wish deepseek distills were somehow branded differently. The amount of confusion I’ve come across from otherwise technical folk, or simply mislabeling (I’m running r1 on my MacBook!) is shocking. It’s my new pet peeve.

271. simonw ◴[25 Mar 25 15:31 UTC] No.43472557{5}[source]▶

>>43471395 #

Right: I could give you a recipe that tells you to first create a Python virtual environment, then install mlx-vlm, then make sure to downgrade to numpy 1.0 because some of the underlying libraries don't work with numpy 2.0 yet...

... or I can give you a one-liner that does all of that with uv.

replies(1): >>43473167 #

272. anon373839 ◴[25 Mar 25 15:32 UTC] No.43472567{4}[source]▶

>>43464792 #

A but of a tangent, but what’re your thoughts on code agents compared to the standard “blobs of JSON” approach? I haven’t tried it myself, but it does seem like it would be a better fit for existing LLMs’ capabilities.

273. anon373839 ◴[25 Mar 25 15:38 UTC] No.43472624{8}[source]▶

>>43469668 #

That may well be true. I know that earlier models like Llama 1 65B could tolerate more aggressive quantization, which supports that idea.

274. intalentive ◴[25 Mar 25 15:39 UTC] No.43472639[source]▶

>>43464396 #

Asymptotic improvement will never hit the wall

275. indigodaddy ◴[25 Mar 25 16:27 UTC] No.43473139{4}[source]▶

>>43464512 #

Can open-webui update code on your local computer ala cursor etc?

replies(1): >>43473514 #

276. mistrial9 ◴[25 Mar 25 16:30 UTC] No.43473167{6}[source]▶

>>43472557 #

python-specific side question -- is there some indication in the python ecosystems that Numpy 2x is not getting adoption? numpy-1.26 looks like 'stable' from here

replies(2): >>43475578 #>>43476277 #

277. cess11 ◴[25 Mar 25 17:04 UTC] No.43473514{5}[source]▶

>>43473139 #

It has a module system so maybe it can but it seems more people are using Aider or Continue for that. There's a bit of stitching things together regardless of whether you show your project to some SaaS or run local models but if you can manage a Linux system it'll be easy.

Personally I heavily dislike the experience though, so I might not be the best one to answer.

278. Alifatisk ◴[25 Mar 25 17:16 UTC] No.43473612{4}[source]▶

>>43470236 #

I use them through chat.qwenlm.ai, what's nice is that you can run your prompt through 3 different modes in parallel to see which suits the best for that case.

The daily quota I spoke about is chatgpt and claude, those are very limited on the usage (for free users at least, understandable), while on Qwen, I have felt likeI am abusing it with how much I use it. It's very versatile in the sense that it has capabilities like image generation, video generation, massive context window, both visual and textual reasoning all in one place.

Alibaba is really doing something amazing here.

279. yencabulator ◴[25 Mar 25 17:43 UTC] No.43473925{4}[source]▶

>>43467459 #

Based on the presented reasoning, that means humanity wins! Yay!

280. cess11 ◴[25 Mar 25 17:47 UTC] No.43473947{4}[source]▶

>>43464792 #

You mean like https://manusai.ai/ is supposed to function?

replies(1): >>43475271 #

281. yencabulator ◴[25 Mar 25 17:48 UTC] No.43473969{6}[source]▶

>>43466151 #

Or let go of the traditional definition of a GPU, and go integrated. AMD Ryzen AI Max+ 395 with 128GB RAM is a promising start.

282. cryptocrat7 ◴[25 Mar 25 17:55 UTC] No.43474018[source]▶

>>43464068 (OP) #

i have been using it, agreed :100

283. cryptocrat7 ◴[25 Mar 25 17:56 UTC] No.43474033[source]▶

>>43464068 (OP) #

there should be a way to share these prompts + tools through visuals

284. CamperBob2 ◴[25 Mar 25 18:01 UTC] No.43474068{7}[source]▶

>>43466531 #

What in the world is that supposed to prove? Let's see you do that in your head.

Tell it to use code if you want an exact answer. It should do that automatically, of course, and obviously it eventually will, but jeez, that's not a bad Fermi guess for something that wasn't designed to attempt such problems.

285. mdp2021 ◴[25 Mar 25 18:03 UTC] No.43474088{5}[source]▶

>>43468526 #

> unless you need a lot of customizations or care a lot about data leaks

And both those needs are very normal. "Customization" in this case can just be "specializing the LLM on local material for specialized responses".

286. th0ma5 ◴[25 Mar 25 18:15 UTC] No.43474199{4}[source]▶

>>43465856 #

I think you need to realize your fans don't have the same intent as you. You should ask your audience what they want you may be surprised.

287. th0ma5 ◴[25 Mar 25 18:16 UTC] No.43474204{6}[source]▶

>>43465877 #

Don't you think that presenting this as learning or knowledge is unethical?

288. refulgentis ◴[25 Mar 25 20:01 UTC] No.43475271{5}[source]▶

>>43473947 #

Yes, exactly, and no trivially: Manus is Sonnet with tools

replies(1): >>43479830 #

289. maleldil ◴[25 Mar 25 20:28 UTC] No.43475578{7}[source]▶

>>43473167 #

I have a project on torch 2.6 and numpy 2.2. I never had any issues with that combination.

290. _xtrimsky ◴[25 Mar 25 20:31 UTC] No.43475614{4}[source]▶

>>43466158 #

I've tried self hosting. It is quite difficult, and either you are limited to low models, either you need a very expensive setup. I couldn't run this model on my gaming computer.

If I try other models, I basically end up with a very bad version of AI. Even if I'm someone who uses Anthropic APIs a lot, it's absolutely not worth it to try and self host it. The APIs are much better and you get much cheaper results.

Self hosting for AI might be useful for 0.001% of people honestly.

291. Imustaskforhelp ◴[25 Mar 25 20:45 UTC] No.43475765{9}[source]▶

>>43471918 #

I mean how is a free and open source model not a free market schism atleast in the ai world.

292. dragonwriter ◴[25 Mar 25 20:49 UTC] No.43475800{8}[source]▶

>>43468807 #

> My reasoning for proprietary AI to be associated with communism is that they aren't competing in a free market way where everyone does one thing and do its best.

That seems based on a very weird idea of what capitalism and communism are; idealized free markets have very little to do with the real-world economic system for which the name “capitalism” was coined, and dis-integration where “everyone does one thing” has little to do with either capitalism or free markets, though it might be a convenient assumption for 101-level discussions of market competition where you want to avoid dealing with real-world issues like partially-overlapping markets and imperfect substitutes to assume every good exists in an isolated market of goods which compete only and exactly with the other groups in that same market in a simple way.

293. simonw ◴[25 Mar 25 21:37 UTC] No.43476277{7}[source]▶

>>43473167 #

I think it's just that it's a breaking change to a fundamental library, so it will take many months for the ecosystem to upgrade.

Similar thing happened when Pydantic upgraded from 1 to 2.

294. rfoo ◴[26 Mar 25 04:33 UTC] No.43478904{9}[source]▶

>>43471525 #

tbh the "short the stock market" story is pretty silly, it wasn't predictable at all. but yeah, the guy got to do whatever he want to do now.

295. mirekrusin ◴[26 Mar 25 06:10 UTC] No.43479299{5}[source]▶

>>43472142 #

...or working for company which forbids sending IP over wire somewhere.

Also "many hours of inference daily" may mean you're doing your usual stuff daily while running some processing in the background that takes hours/days or you've put together some reactive automation that runs often all the time.

ps. local training rarely makes sense.

ps. 2. not sure where you got 50x slower from; 4090 is actually faster than A100 for example and 5090 is ~75% faster than 4090

296. cess11 ◴[26 Mar 25 07:53 UTC] No.43479830{6}[source]▶

>>43475271 #

Right. Apparently they also claim it's more than that:

https://xcancel.com/peakji/status/1898997311646437487

replies(1): >>43501400 #

297. Zambyte ◴[26 Mar 25 09:36 UTC] No.43480359{9}[source]▶

>>43471651 #

Oops, yeah I don't know how that got autocorrected three times without my noticing. Abliterated.

298. akra ◴[26 Mar 25 19:53 UTC] No.43486411{7}[source]▶

>>43471392 #

I'm not saying there is a centralised force - I didn't say the government per se. Its enough to say many of the models coming out of China - the AI portion isn't their main income source especially for the major models that people are hyping up (Qwen, DeepSeek, etc). This model (Qwen) from Alibaba is a side model more likely complimenting their main business and cloud offerings. DeepSeek started as a way to use AI for trading models firstly; then spun up on the side. I'm more speaking about China's general position - for them AI seems to be more of a compliment than the main business as compared say to the major AI labs in America (ex Google). My opinion is that robotics in particular just extends that going forward.

Given as you say the long term cost of AI models is marginally zero, I don't think this is a bad position to be in.

299. refulgentis ◴[28 Mar 25 04:01 UTC] No.43501400{7}[source]▶

>>43479830 #

No, they don't, that's just a bunch of other stuff (ex. Something something we don't differ from academic papers on agents (???))

300. ItDoBeWimdyTho ◴[31 Mar 25 19:31 UTC] No.43538865{3}[source]▶

>>43471257 #

That comparison actually makes human reasoning abilities more impressive.

Helen Keller still learned robust generalizations.

↑