I would be pretty regretful of just the first sentence in the article, though:
> I ordered a set of 10 Compute Blades in April 2023 (two years ago), and they just arrived a few weeks ago.
That's rough.
'Worth it any more'? At this size, never. A Pi is a Pi is a Pi!
A few are fine for toying around; beyond that, hah. Price:perf is rough, does not improve with multiplication [of units, cost, or complexity].
Unless you can keep your compute at 70% average utilization for 5 years - you will never save money purchasing your hardware compared to renting it.
Faith in the perfect efficiency of the free market only works out over the long term. In the short term we have a lot of habits that serve as heuristics for doing a good job most of the time.
Also no. The guy's a youtuber
On the other hand, will this make him 100+k views? Yes. It's bait - the perfect combo to attract both the AI crowd and the 'homelab' enthusiasts (of which the bulk are yet to find any use for their raspberry devices)...
Or the oldie-but-goodie paper "Scalability! But at what COST?": https://www.usenix.org/system/files/conference/hotos15/hotos...
Long story short, performance considerations with parallelism go way beyond Amdahl's Law, because supporting scale-out also introduces a bunch of additional work that simply doesn't exist in a single node implementation. (And, for that matter, multithreading also introduces work that doesn't exist for a sequential implementation.) And the real deep down black art secret to computing performance is that the fastest operations are the ones you don't perform.
If your goal is to play with or learn on a cluster of Linux machines, the cost effective way to do it is to buy a desktop consumer CPU, install a hypervisor, and create a lot of VMs. It’s not as satisfying as plugging cables into different Raspberry Pi units and connecting them all together if that’s your thing, but once you’re in the terminal the desktop CPU, RAM, and flexibility of the system will be appreciated.
Not at all the best, but they were cheap. If i WANTED the best or reliable, i'd actually buy real products.
Also the Mac Studio is a bit hampered by its low compute-power, meaning you really can't use a 100b+ dense model, only MoE feasibly without getting multi minute prompt-processing times (assuming 500+ tokens etc.)
So for $3000, that's 3000 hours, or 125 days, (if just wastefully leave them on all the time, instead of turning them on when needed).
Say you wanted to play around for a couple of hours, that's like.. $3.
(That's assuming there's no bonus for joining / free tier, too.)
Nothing that is not AGPL-licensed, so you and your company haven't taken advantage of it.
I am not sure how this relates to my comment though.
Not that its a problem, I don't see why it would inherently be a negative thing. Dude seems to make some good content across a lot of different mediums. Cheers to Jeff.
All you needed to do is buy 4x xtx 7900 used on ebay and build a four node raspberry pi cluster using the external GPU setup you've come up with in one of your previous blog posts [0].
[0] https://www.jeffgeerling.com/blog/2024/use-external-gpu-on-r...
https://www.jeffgeerling.com/projects
And the inference is that he is doing this for clicks, i.e. clickbait. The very title is disingenuous.
Your attack on the poster above you is childish.
2) Hardware optimization (the exact GPU you want may not always be available for some providers)
3) Not subject to price changes
4) Not subject to sudden Terms of Use changes
5) Know exactly who is responsible if something isn't working.
6) Sense of pride and accomplishment + Heating in the winter
https://www.youtube.com/c/JeffGeerling
"978K subscribers 527 videos"
Jeff's had a pattern of embellishing controversies, misrepresenting what people say, and using his platform to create narratives that benefit his content's engagement. This is yet another example of farming outrage to get clicks. I don't understand why people drool over his content so much.
Currently the cloud providers are dumping second gen xeon scalables and those things are pigs when it comes to power use.
Sound wise its like someone running a hair dryer at full speed all the time and it can be louder under load.
Maybe I'm missing something.
It’s an overrated, overhyped little computer. Like ok it’s small I guess but why is it the default that everyone wants to build something new on? Because it’s cheap? Whatever happened to buy once, cry once? Why not just build an actual powerful rig? For your NAS? For your firewalls? For security cameras? For your local AI agents?
YouTube is absolute jam packed full of people pitching home "lab" sort of AI buildouts that are just catastrophically ill-advised, but it yields content that seems to be a big draw. For instance Alex Ziskind's content. I worry that people are actually dumping thousands to have poor performing ultra-quantized local AIs that will have zero comparative value.
$3,000 is well under many "oopsie billsies" from cloud providers.
And that's outside of the whole "I own it" side of the conversation, where things like latency, control, flexibility, & privacy are all compelling reasons to be willing to spend slightly more.
I still run quite a number of LLM services locally on hardware I bought mid-covid (right around 3k for a dual RTX3090 + 124gb system ram machine).
It's not that much more than you'd spend if you're building a gaming machine anyways, and the nifty thing about hardware I own is that it usually doesn't stop working at the 5 year mark. I have desktops from pre-2008 still running in my basement. 5 year amortization might have the cloud win, but the cloud stops winning long before most hardware dies. Just be careful about watts.
Personally - I don't think pi clusters really make much sense. I love them individually for certain things, and with a management plane like k8s, they're useful little devices to have around. But I definitely wouldn't plan to get good performance from 10 of them in a box. Much better off spending roughly the same money for a single large machine unless you're intentionally trying to learn.
What's the margin on unplugging vs just powering off?
I don't need to transcode + I need something I can leave on that draws little power.
I have a powerful rig, but the one time I get to turn it off is when I'd need the media server lol.
There's a lot of scenarios where power usage comes into play.
These clusters don't make much sense to me though.
1) How much worse / more expensive are they than a conventional solution?
2) What kinds of weird esoteric issues pop up and how do they get solved (e.g. the resizable BAR issue for GPU's attached to RPi's PCIe slot)
But if you're someone like me who intends to actively use the hardware for real-world purposes, the cloud often simply can't compete on price. At home, I have a mini PC with a 5600G, 32GB of RAM, and a few TBs of NVME storage. The entire thing cost less than $600 a few years ago, and consumes around 20W of power on average.
Even on the cheapest cloud providers available, an equivalent setup would exceed that price in less than half a year. SSD storage in particular is disproportionately expensive on the cloud. For small VMs that don't need much storage, it does make sense, but as soon as you scale up, cloud prices quickly start ballooning.
But also when it comes to Vast/RunPod it can be annoying and genuinely become more expensive if you have to rent 2x the number of hours because you constantly have to upload and download data, checkpoints, continuous storage costs, transfer data to another server because the GPU is no longer available, etc. It's just less of a headache if you have an always available GPU with a hard drive plugged into the machine and that's it
Nobody is really building CPU clusters these days.
> DO NOT TAKE HOME THE FREE 1U SERVER YOU DO NOT WANT THAT ANYWHERE A CLOSET DOOR WILL NOT STOP ITS BANSHEE WAIL TO THE DARK LORD AN UNHOLY CONDUIT TO THE DEPTHS OF INSOMNIA BINDING DARKNESS TO EVEN THE DAY
The economics of spending $3,000 on a video probably work out fine.
TL;DR, just buy one framework desktop and it's better than the Pi AI cluster of the OP in every single performance metrics including cost, performance, efficiency, headache, etc.
Somehow I've actually gotten every item I backed shipped at some point (which is unexpected).
Hardware startups are _hard_, and after interacting with a number of them (usually one or two people with a neat idea in an underserved market), it seems like more than half fail before delivering their first retail product. Some at least make it through delivering prototypes/crowdfunded boards, but they're already in complete disarray by the end of the shipping/logistics nightmares.
But still can be decent for HPC learning, CI testing, or isolated multi-node smaller-app performance.
And regarding efficiency, in CPU-bound tasks, the Pi cluster is slightly more efficient. (Even A76 cores on a 16nm node still do well there, depending on the code being run).
[1] The Framework Desktop is a beast:
https://news.ycombinator.com/item?id=44841262
[2] HP ZBook Ultra:
A lot of people (here, Reddit, elsewhere) speculate about how good/bad a certain platform or idea is. Since I have the means to actually test how good or bad something is, I try to justify the hardware costs for it.
Similar to testing various graphics cards on Pis, I've probably spent a good $10,000 on those projects over the past few years, but now I have a version of every major GPU from the past 3 generations to test on, not only on Pi, but other Arm platforms like Ampere and Snapdragon.
Which is fun, but also educational; I've learned a lot about inference, GPU memory access, cache coherency, the PCIe bus...
So a lot of intangibles, many of which never make it directly into a blog post or video. (Similar story with my time experiments).
Right now the Macs are viable purely because you can get massive amounts of unified memory. Be pretty great when they have the massive matrix FMA performance to complement it.
Was it fast? No. But that wasn't the point. I was learning about distributed computing.
I know for many who run SBCs (RK3588, Pi, etc.), very little is 1-2W idle, which is almost nothing (and doesn't even need a heatsink if you can stand some throttling from time to time).
Most of the Intel Mini PCs (which are about the same price, with a little more performance) idle at 4-6W, or more.
The desktop equivalent of your 10 T3 Micro instances is about $600 if you buy new. For example a Lenovo ThinkCentre M75q Gen 2 Tiny 11JN009QGE has 8x3.2GHz processor with hyperthreading. That's 16 virtual cores compared to the 20 vcpus of the T3 instances, but with much faster cores. And 16GB RAM allows you to match the 1GB per instance.
If you don't have anything and feel generous throw in another $200 for a good monitor and keyboard plus mouse. But you can get a used crap monitor for $20. I'd give you one for free just to be rid of it.
That's a total of $800, or 33 days of forgetting to shut down the 10 VMs. Maybe half that if you buy used.
Granted not everyone has $800 or even $400 to drop on hobby projects, renting VMs often does make sense
Another fun fact, the network module of the pi is actually connected to the USB bus, so there's some overhead as well as a throughput limitation.
Fun fact, the Pi does not have a power button, relying on software to shut down cleanly. If you lose access to the machine, it's not possible to avoid corrupted states on the disk.
Despite all of this, if you want to self host some website, the raspberry pi is still an amazingly cost effective choice, from anywhere between 2 to 20000 monthly users, one pi will be overprovisioned. And you can even get an absolutely overkill redundant pi as a failover, but still a single pi can reach 365 days of uptime with no problem, and as long as you don't reboot or lose power or lose internet, you can achieve more than a couple of nines of reliability.
But if you are thinking of a third, much less a 10th raspberry pi, you are probably scaling the wrong way, way before you reach the point where a quantity matters ( a third machine), it becomes cost effective to upgrade the quality of your one or two machines.
On the embedded side it's the same story, these are great for prototyping, but you are not going to order 10k and sell them in production, maybe a small 100 test batch? But you will optimize and make your own PCB before a mass batch.
It also means it performs like a 10 year old server CPU, so those 28 threads are not exactly worth a lot. The geekbench results, for whatever value those are worth, are very mediocre in the context of anything remotely modern: https://browser.geekbench.com/processors/intel-xeon-e5-2690-...
Like a modern 12-thread 9600x runs absolute circles around it https://browser.geekbench.com/processors/amd-ryzen-5-9600x
I then used many of his ansible playbooks on my day to day job, which paid my bills and made my career progress.
I don't check youtube so I didn't know that he was an "youtuber", I do know his other side and how mucH I have leveraged his content/code in my career
$50/month for 100W continuous usage isn't totally mad, and that could climb even higher over the rest of the decade.
Rates have gone up enormously because the cost of wildfires is falling on ratepayers, not the utility owners.
Regulated monopolies are pretty great, aren’t they? Heads I win, tales you lose.
It was expensive, but slow it is not for small queries.
Now, if I want to bump the context window to something huge, it does take 10-20 seconds to respond for agent tasks, but it’s only 2-3x slower than paid cloud models, in my experience.
Still a little annoying, and the models aren’t as good, but the gap isn’t nearly as big as you imply, at least for me.
No, you are dismissive because you don't care about the use-cases.
The R.Pi 4 , 400, and the 500 are great models. Consider all the advantages together:
i= support for current Debian
ii= stellar community
iii= ease of use (UX), especially for people new to Debian and/or coding and/or Linux
iv= quiet, efficient, low power and passively cooled
v= robust enough to be left running for a long time
There are cheaper, more performant x86 and ARM dev boards and SOCs. But nothing compares to the full set of advantages.
That said, building a $3K A.I. cluster is just a senseless, expensive lark. (^;
> Another fun fact, the network module of the pi is actually connected to the USB bus, so there's some overhead as well as a throughput limitation.
> Fun fact, the Pi does not have a power button, relying on software to shut down cleanly. If you lose access to the machine, it's not possible to avoid corrupted states on the disk.
With all these caveats in mind, a raspberry pi seems to be an incredibly poor choice for distributed computing
Cf. the various Beagle boards which have mainline linux and u-boot support right from release, together with real open hardware right down to board layouts you can customise. And when you come to manufacture something more than just a dev board, you can actually get the SoC from your normal distributor and drop it on your board - unlike the strange Broadcom SoCs rpi use.
I'm quite a lot more positive about rp2040 and rp2350, where they've at least partially broken free of that Broadcom ball-and-chain.
No, I wouldn’t think.
It's really not though. I've been a Pi user and fan since it was first announced, and I have dozens of them, so I'm not hating on RPi here; we did the maths some time back here on HN when something else Pi related came up.
If you go for a Pi5 with say 8GB RAM, by the time you factor in an SSD + HAT + PSU + Case + Cooler (+ maybe a uSD), you're actually already in mini-PC price territory and you can get something much more capable and feature complete for about the same price, or for a few £ more, something significantly more capable, better CPU, iGPU, you'll get an RTC, proper networking, faster storage, more RAM, better cooling, etc, etc, and you won't be using much more electricity either.
I went this route myself and have figuratively and literally shelved a bunch of Pis by replacing them with a MiniPC.
My conclusion, for my own use, after a decade of RPi use, is that a cheap mini PC is the better option these days for hosting/services/server duty and Pis are better for making/tinkering/GPIO related stuff, even size isn't a winner for the Pi any more with the size of some of the mini-PCs on the market.
That said, I'm of the opinion that power/water/internet should all be state/county/city ran. I don't want my utilities companies to have profit motives.
My water company just got bought up by a huge water company conglomerate and, you guessed it, immediate rate increases.
Commodity desktop cpus with 32 or 64GB RAM can do all of this in a low-power and quiet way without a lot more expense.
That combo gives you the better part of a gigabyte of L3 cache and an aggregate memory bandwidth of 600 GB/s, while still below 1000W total running at full speed. Plus your NICs are the fancy kind that let you play around with RoCEv2 and such nifty stuff.
It would also be relevant to then also learn how to do stuff properly with SLURM and Warewulf etc. instead of a poor mans solution with Ansible playbooks like in these blog posts.
> This post is more than 10 years old, I do not delete posts...
https://www.jeffgeerling.com/articles/religion/abortion-case...
I guess GP is referring to that post.
The homelab group on Reddit is full of people who don't understand any of this - they have full racks in their house that could be replaced with one high-end desktop.
If the goal is a lot of RAM and you don’t care about noise, power, or heat then these can be an okay deal.
Don’t underestimate how far CPUs have come, though. That machine will be slower than AMD’s slowest entry-level CPU. Even an AMD 5800X will double its single core performance and even walk away from it on multithreaded tasks despite only having 8 cores. It will use less electricity and be quiet, too. More expensive, but if this is something you plan to leave running 24/7 the electricity costs over a few years might make the power hungry server more expensive over time.
> After fixing the thermals, the cluster did not throttle, and used around 130W. At full power, I got 325 Gflops
I was sort of surprised to find that the top500 list on their website only goes back to 1993. I was hoping to find some ancient 70’s version of the list where his ridiculous Pi cluster could sneak on. Oh well, might as well take a look… I’ll pull from the sub-lists of
https://www.top500.org/lists/top500/
They give the top 10 immediately.
First list (June 1993):
placement name RPEAK (GFlop/s)
1 CM-5/1024 131.00
10 Y-MP C916/16256 15.24
Last list he wins, I think (June 1996): 1 SR2201/1024 307.20
10 SX-4/32 64.00
First list he’s bumped out of the top 10 (November 1997): 1 ASCI Red 1,830.40
10 T3E 326.40
I think he gets bumped off the full top500 list around 2002-2003. Unfortunately I made the mistake of going by Rpeak here, but they sort by Rmax, and I don’t want to go through the whole list.Apologies for any transcription errors.
Actually, pretty good showing for such a silly cluster. I think I’ve been primed by stuff like “your watch has more compute power than the Apollo guidance computer” or whatever to expect this sort of thing to go way, way back, instead of just to the 90’s.
If your local regulators approved the merger and higher rates, your complaint is with them as much as the utility company.
Not saying that some regulators are not basically rubber stamps or even corrupt.
Like what about the people who maintain the alpha/sparc/parisc linux kernels? Or the designers behind idk tilera or tenstorrent hardware.
If it's for personal use, do whatever... there's nothing wrong with buying a $60,000 sports car if you get a lot of enjoyment out of driving it. (you could also lease if you want to trade up to the "faster model" next year) For business, renting (and managed hosting) makes more sense.
For those like me that don't know the joke:
Two economists are walking down the street. One of them says “Look, there’s a twenty-dollar bill on the sidewalk!” The other economist says “No there’s not. If there was, someone would have picked it up already.”
Fuckin nutty how much juice those things tear through.
A lot of that group is making use of the IO capabilities of these systems to run lots of PCI-E devices & hard drives. There's not exactly a cost-effective modern equivalent for that. If there were cost-effective ways to do something like take a PCI-E 5.0 x2 and turn it into a PCI-E 3.0 x8 that'd be incredible, but there isn't really. So raw PCI-E lane count is significant if you want cheap networking gear or HBAs or whatever, and raw PCI-E lane count is $$$$ if you're buying new.
Also these old systems mean cheap RAM in large, large capacities. Like 128GB RAM to make ZFS or VMs purr is much cheaper to do on these used systems than anything modern.
I do get to see and play with a lot of interesting systems, but for most of them, I only get to go just under surface-level. It's a lot different seeing someone who's reverse engineered every aspect of an IBM PC110, or someone who's restored an entire old mainframe that was in storage for years... or the group of people who built an entire functional telephone exchange with equipment spread over 50 years (including a cell network, a billing system, etc.).
Exactly. This build sounds like the proverbial "1024 chickens" in Seymour Cray's famous analogy. If nothing else, the communications overhead will eat you alive.
I did (as did others), in fact, write in comments and complaints about the rate increases and buyout. That went unheard.
Like if you have a large media library, you need to push maybe 10MB/s, you don't need 128GB of RAM to do that...
It's mostly just hardware porn - perhaps there are a few legit use cases for the old hardware, but they are exceedingly rare in my estimate.
Yeah, this is a now-long-wide-known issue with LLM processing. This can be remediated so that all nodes split computation, but then you'll come back to classical supercomputing problem of node interconnect latency/bandwidth bottlenecks.
It looks to me that many such interconnect simulate Ethernet cards. I wonder if it can be recreated using the M.2 slot rather than using that slot for node-local data, and cost effectively so(like cheaper than bunch of 10GE cards and short DACs).
Also the other peripherals you consider are irrelevant, since you would need them (or not), in other setups. You can use a pi without a PSU for example. And if you use an SSD, you have to consider that cost in whatever you compare it to.
>I went this route myself and have figuratively and literally shelved a bunch of Pis
>and I have dozens of them,
Reread my post? I meant specifically that Pis are great for the 1 to 2 range. with 3 pis you should change to something else. So I'm saying they are good at the 100$-200$ budget, but bad anywhere above that.
I’m finally at the point where I can dedicate time for building an AI with a specific use case in mind. I play competitive paintball and would like to utilize AI for a handful of things. Specifically hit detections in video streams. Pi’s were my natural choice simply because of low cost of entry and wide range of supported products to get a PoV running. I even thought about reaching out to Jeff and asking his input.
This post didn’t change my direction too much, but it did help level set some realistic expectations. So thanks for sharing.
But certainly don’t imitate his choices, his economics aren’t your economics!
For just streaming a 4k bluray you need more than 10MB/s, Ultra HD bluray tops out at 144 Mbit/s. Not to mention if that system is being hit by something else at the same time (backup jobs, etc...).
Is the 128GB of RAM just hardware porn? Eh, maybe, probably. But if you want 8+ bays for a decent sized NAS then you're already quickly into price points at which point these used servers are significantly cheaper, and 128GB of RAM adds very little to the cost so why not.
I don't know anyone who would think this actually.
the common denominator is always capital gain
capitalism is the reason why we haven't been able to go back to the moon and build bases there
From the official website:
> Does Raspberry Pi 5 need active cooling?
> Raspberry Pi 5 is faster and more powerful than prior-generation Raspberry Pis, and like most general-purpose computers, it will perform best with active cooling.
I greatly respect Jeff's work, but he's a professional YouTuber, so his projects will necessarily lean towards clickbait and riding trends (Jeff, I don't mean this as criticism!) He's been a great advocate for doing interesting things with RasPis, but "interesting" != "rational"
It's definitely not suited for production, but there, you won't find old blade servers either (for the power to performance issue).
Plus cloud gaming is always limited in range of games, there are restrictions on how you can use the PC (like no modding and no swapping savegames in or out).
Competition is what creates efficiency. Without it you live in a lie.
You'd be surprised by the number of emails, Instagram DMs, YouTube comments, etc. I get—even after explicitly showing how bad a system is at a certain task—asking if a Pi would be good for X, or if they could run ChatGPT on their laptop...
blanket-blaming capitalism without good reasoning is becoming the new red-flag of "can't think critically"
Did OP really think his fellow humans are that moronic that they just didn't find out you can plug in together a cuple of rasperri pis?
Don’t hate the player, hate the game.
Frontier is right behind it with the same arrangement.
Having honest to god dedicated GPUs on their own data bus with their own memory isn't necessarily the fastest way to roll.
If anything, 2nd hand AMD gaming rigs make more sense than old servers. I say that as someone with always off r720xd at home due to noise and heat. It was fun when I bought it during winter years ago, until summer came.
And suddenly you can start playing with distributed software, even though it's running on a single machine. For resiliency tests you can unplug one machine at a time with a single click. It will annihilate a Pi cluster in Perf/W as well, and you don't have to assemble a complex web of components to make it work. Just a single CPU, motherboard, m.2 SSD, and two sticks of RAM.
Naturally, using a high core count machine without virtualization will get you best overall Perf/W in most benchmarks. What's also important but often not highlighted in benchmarks in Idle W if you'd like to keep your cluster running, and only use it occasionally.
On my 96 GB DDR5-6000 + RTX 5090 box, I see ~20s prefill latency for a 65k prompt and ~40 tok/s decode, even with most experts on the CPU.
A Mac Studio will decode faster than that, but prefill will be 10s of times slower due to much lower raw compute vs a high-end GPU. For long prompts that can make it effectively unusable. That’s what the parent was getting at. You will hit this long before 65k context.
If you have time, could you share numbers for something like:
llama-bench -m <path-to-gpt-oss-120b.gguf> -ngl 999 -fa 1 --mmap 0 -p 65536 -b 4096 -ub 4096
Edit: The only Mac Studio pp65536 datapoint I’ve found is this Reddit thread:
https://old.reddit.com/r/LocalLLaMA/comments/1jq13ik/mac_stu ...
They report ~43.2 minutes prefill latency for a 65k prompt on a 2-bit DeepSeek quant. Gpt-oss-120b should be faster than that, but still very slow.
https://core.coop/my-cooperative/rates-and-regulations/rate-...
Getting some NUC-like machines makes a lot more sense to me. You’ll get 2.5Gb/s Ethernet at the least and way more FlOPS as well.
Like, if you buy that card it can still be processing things for you a decade from now.
Or you can get 3 months of rental time.
---
And yes, there is definitely a point where renting makes more sense because the capital outlay becomes prohibitive, and you're not reasonably capable of consuming the full output of the hardware.
But the cloud is a huge cash cow for a reason... You're paying exorbitant prices to rent compared to the cost of ownership.
private space companies, despite decades of hype and funding, have stagnated by comparison
the fact that SpaceX depends heavily on government contracts just to function is yet another proof: their "innovation" isn't self sustaining, it's underwritten by taxpayer money
are you denying that NASA landed on the Moon?
Elon psyop doesn't work on me, i know who is behind it all, they need a charismatic sales man for the masses, just like Ford, Disney, Reagan and all, masking structural power with a digestible story for the masses
> blanket-blaming capitalism without good reasoning is becoming the new red-flag of "can't think critically"
it's quite the opposite, people unable to take criticism of capitalism, talk about "critical thinking", how is China doing?
Handy: https://700c.dk/?powercalc
My Pi CM4 NAS with a PCIe switch, SATA and USB3 controllers, 6 SATA SSDs, 2 VMs, 2 LXC containers, and a Nextcloud snap pretty much sits at 17 watts most of the time, hitting 20 when a lot is being asked of it, and 26-27W at absolute max with all I/O and CPU cores pegged. €3.85/mo if I pay ESB, but I like to think that it runs fully off the solar and batteries :)
For comparison there are 9,988,224 GPU compute units in El Capitan and only 1,051,392 CPU cores. Roughly one CPU core to push data to 10 GPU CUs.
A lot of business are paying obscene money to cloud providers when they could have a pair of racks and the staff to support it.
Unless you're paying attention to the bleeding edge of the server market, to its costs (better yet features and affordability) this sort of mistake is easy to make.
The article is by someone who does this sort of thing for fun, and views/attention, and im glad for it... it's fun to watch. But it's sad when this same sort of misunderstanding happens in professional settings, and it happens a lot.
This. Some cloud providers offer VMs with 4GB RAM and 2 virtual cores for less than $4/month. If your goal is to learn how to work with clusters, nothing beats firing up a dozen VMs when it suits your fancy, and shut them down when playtime is over. This is something you can pull off in a couple of minutes with something like an Ansible script.
And what case are you putting them into? What if you want it rack mounted? What about >1gig networking? What if I want a GPU in there to do whisper for home assistant?
Used gaming rigs are great. But used servers also still have loads of value, too. Compute just isn't one of them.
Zero of any of that is needed. The new Pi "works best" with a cooler sure but at standard room temps will be fine for serving web apps and custom projects and things. You do not need an SSD. You do not need a HAT for anything.
Apparently the Pi 5 8gb is $120 though WTF.
What personal web site or web app or project can't run just fine on a Pi Zero 2 though? It's a little RAM starved but performance wise it should be sufficient.
Other than second-hand mini PCs, old laptops also make great home servers. They have built in UPS!
Youtube demonstrably wants clickbait titles and thumbnails. They built tooling to automatically A/B test titles and thumbnails for you.
Youtube could fix this and stop it if they want, but that might lose them 1% of business so they never will.
They love that you blame creators for this market dynamic instead of the people who literally create the market dynamic.
Pretty sure most of us aren't running anywhere close to full load 24/7, but whoa, Irish power is expensive. In the central US I pay $0.14/KWh.
The only 100% required thing on there is some sort of power supply, and an SD card, and I suspect a lot of people have a spare USB-C cable and brick lying around. A cooler is only recommended if you're going to be putting it under sustained CPU load, and they're like $10 on Amazon.
The current RPi 5 makes no sense to me in any configuration, given its pricing.
Then look at Apple’s ARM offerings, and AWS Graviton if you need ARM with raw power.
If you need embedded/GPIO you should consider an Arduino, or clone. If you need GPIOs and Internet connectivity, look at an ESP32. GPIOs, ARM and wired ethernet? Consdier the the STM32H.
Robotics/machine vision applications, needing IO and lots of compute power? Consider a regular PC with an embedded processor on serial or USB. Or nvidia jetson if you want to run CUDA stuff.
And take a good hard look at your assumptions, as mini PCs using the Intel N100 CPU are very competitive with modern Pis.
Quickly learned that there is so much more to manage when you split a task up across systems, even when the system (like Cinema 4D) is designed for it.
Which got me thinking about how do these frontier AI models work when you (as a user) run a query. Does your query just go to one big box with lots of GPUs attached and it runs in a similar way, but much faster? Do these AI companies write about how their infra works?
Starting with the Pi 4, they started saying that a cooler isn't required, but that it may thermal throttle without one if you keep the CPU pegged.
I run a K8s "cluster" on a single xcp-ng instance, but you don't even really have to go that far. Docker Machine could easily spin up docker hosts with a single command, but I see that project is dead now. Docker Swarm I think still lets you scale up/down services, no hypervisor required.
I think the biggest problem with cluster products is that they just don't work out of the box. Vendors haven't really done the "last 2%" of development required to make them viable - its left to us purchasers to get the final bits in place.
Still, it'll make a fun distributed computing experimental platform some day.
Just like the Inmos Transputer I've got somewhere, sitting in a box, waiting for a power supply ..
I believe the Rasp Pi cluster is one of the cheapest multi node / MPI machines you can buy. That's useful even if it is t fast. You need to practice the programming interfaces, not necessarily make a fast computer.
However, NUMA is also a big deal. The various AMD Threadrippers with multi-die memory controllers are better on this regards. Maybe the aging Threadrippers 1950x, yes it's much slower than modern chips but the NUMA issues are exaggerated (especially poor) on this old architecture.
That exaggerates the effects of good NUMA and now you as a programmer can get more NUMA skills.
Of course, the best plan is to spend $20,000,000++ on your own custom NUMA nodes cluster out of EPYCs or something.
-------
But no. The best supercomputers are your local supercomputers that you should rent some time from. You need a local box to see various issues and learn to practice programming.
Particularly with Pi 5, any old brick that might be hanging around has a fair chance at not being able to supply sufficient power.
The best option for DP throughput for hobbyists interested in HPC might be old AMD cards from before they, too, realized that scientific folks would pay up the nose for higher precision.
Not so good, and this is the sort of title. you need to bring the punters in for YouTube.
I don't mean to sound too cynical, I appreciate Jeff's videos, just wanted to point out that if you've spent money and time on content you can either ditch it or make a regret video.
Just so long as the thumbnails don't have an arrow on them I'm happy.
If one just wants a cheap desktop box to do desktop things with, then they're a terrible option, price-wise, compared to things like used corpo mini-PCs.
But they're reasonably cost-competitive with other new (not used!) small computers that are tinkerer-friendly, and unlike many similar constructs there's a plethora of community-driven support for doing useful things with the unusual interfaces they expose.
The only problem in practice is that server CPUs don't support S3 suspend, so putting whole thing to sleep after finishing with it doesn't work.
https://www.servethehome.com/lenovo-system-x3650-m5-workhors...
And then, there's the sourcing problem. Components that looked like they were in big supply when the hardware was specced, can end up being in short supply, or worse end of lifed while you're trying to get all the firmware working.
They are essentially for kids to play around with learning computers by blinking LEDs and integrating with circuit boards. The idea of building a high performance cluster with pis is dumb from day one
It's most fun when you can prove the vendor's datasheet is lying about some pin or some function, but they still don't update it after a decade or more. So everyone integrating the chip who hasn't before hits the exact same speed bump!
The ones that are dead straight with no clickbait are 10/10 (the worst performers), and usually by a massive margin. Even with the same thumbnail.
The sad fact is, if you want your work seen on YouTube, you can't just say "I built a 10 node Raspberry Pi blade cluster and ran HPL and LLMs on it".
Some people are fine with a limited audience. And that's fine too! I don't have to write on my blog at all—I earn negative income from that, since I pay for hosting and a domain, but I hope some people enjoy the content in text form like I do.
And yes, they basically have 1 Tbps+ interconnects and throw tens or hundreds of GPUs at queries. Nvidia was wise to invest so much in their networking side—they have massive bandwidth between machines and shared memory, so they can run massive models with tons of cards, with minimal latency.
It's still not as good as tons of GPU attached to tons of memory on _one_ machine, but it's better than 10, 25, or 40 Gbps networking that most small homelabs would run.
Maybe one of the Fractal Designs cases with a bunch of drive bays?
> What if you want it rack mounted?
Companies like Rosewill sell ATX cases that can scratch that itch.
> What about >1gig networking?
What about PCI Express card? Regular ATX computers are expandable.
> What if I want a GPU in there to do whisper for home assistant?
I mean... We started with a gaming rig, right? Isn't a GPU already implicit?
But single board computers with something external to do your GPIO is often way more compelling.
It was also how I learned to setup a Hadoop cluster, and a Cassandra cluster (this was 10 years ago when these technologies were hot)
Having knowledge of these systems and being able to talk about how I set them up and simulated recovery directly got me jobs that 2x and then 3x my salary, I would highly recommend all medium skilled developers setup systems like this and get practicing if you want to get up into the next level
Where a "kid" may be a 53 years old with 30+ years softdev experience who ultimately got to get to the stuff he wanted to for quite some time, and the "blinking LEDs" are a bunch of servos programmatically controlled based on input from a bunch of sensors. While there are definitely better alternatives based on various narrow metrics, especially when it may come to actual productization, the ease (and cheapness, so you don't think much about that spending) of starting with all those easily available for RPi servo array drive boards and various IO ports array boards and all the available software - it is hard to imagine how it can be more easy/cheaper/available than it already is with all that actual compute power and full-featured Linux environment.
I would throw in the RP2040 for consideration as well, and nRF chips if you need wireless connectivity.
You're describing people using RPis to learn distributed systems, and you conclude that these RPis are wasted because RPis were made for paedogogy?
> I run a K8s "cluster" on a single xcp-ng instance, but you don't even really have to go that far.
That's perfectly fine. You do what works for you, just like everyone else. How would you handle someone else accusing your computer resourcss of being wasted?
My realization in ordering the Rock-2Fs is I really only need an MMU (that is, an SBC instead of something like an ESP32) when I'm running something with a graphical desktop, which is, outside my workstation, never (except for kiosks, which I use Android tablets for). -OR when I want to plug something into a bloated SBC board which saves me from having to solder a connector on, which is sometimes.
I use one for running a timelapse camera (camera is USB) while another is a portable mp3 player I can put in shirt pocket and which has aux port (tho its aux line is noisy). -So that's two of the four Rock-2F boards in use.... but it took me far less time to think up uses and deploy 25/25 of seeedstudio's ESP32C3 boards I ordered a couple years ago, and have used ~5/25 of the ESP32C6s I ordered early this year. They're so cheap, and use so much less energy than ARM boards, that it's difficult to justify using the SBCs anymore.
I think they're asking $50 for a base 2GB Pi4B, now -- that's 10 ESP32C3 boards (with integrated WiFi and BMS, btw!) -- and the Pi5 is even less competitive except in what I'd characterize as a very unusual scenario where you need high compute at edge (where it's both needed AND the latency of computing at the edge is lower than sending it to central server for processing), OR you need the security of protected memory, OR you have no central server and an ESP32 isn't going to cut it (I'll say, though, that one can run a thermostat with multiple WiFi-connected thermometers, and run a web server interface just fine.).
What if you had a single server with a zillion cores in it? Maybe you could take some 15 year old MPI code and run it locally -- it'd be like a mini supercomputer with an impossibly fast network.
Intel's first quad core was Kentsfield in 2006. It supports VT-x. AMD's first quad core likewise supports AMD-V. The newer virtualization extensions mostly just improve performance a little or do things you probably won't use anyway like SR-IOV.
One day my primary Raspberry Pi broke (turned out to be a PSU issue), and I thought of having an old laptop running 24/7 as a home server. While being not very power hungry, it’s still wants much more energy (plus it has fans). For a casual usage (I forgot to mention Pi-Hole) it feels like an overkill. So, while a Raspberry Pi isn’t the best, it has its niche, and I’m happy of having one (actually, a few).
The EU (and maybe China?) have been regulating standby power consumption, so most of my appliances either have a physical off switch (usually as the only switch) or should have very low standby power draw.
I don't have the equipment to measure this myself.
The thing that matters more than the CPU for idle power consumption is how efficient the system's power supply is under light loads. The variance between them is large and newer power supplies aren't all inherently better at it.
My cursory research indicates that a low end ryzen would make sense if you are building the board yourself. Right now, I haven’t found a new ryzen mini pc sub 200$. New N100 minis can be had for 150-175$, and if you don’t care so much about power N95 minis are even cheaper.
RockChip, maybe? Little bit pricier but more powerful than Rpi?
If you want to learn physical networking or really need to "see" things happening on physically separate machines just get a free old PC from gumtree or something.
> llama-bench -m ./gpt-oss-120b-MXFP4-00001-of-00002.gguf -ngl 999 -fa 1 --mmap 0 -p 65536 -b 4096 -ub 4096
| model | size | params | backend | threads | n_batch | n_ubatch | fa | mmap | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -: | ---: | --------------: | -------------------: |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Metal,BLAS | 16 | 4096 | 4096 | 1 | 0 | pp65536 | 392.37 ± 43.91 |
| gpt-oss 120B MXFP4 MoE | 59.02 GiB | 116.83 B | Metal,BLAS | 16 | 4096 | 4096 | 1 | 0 | tg128 | 65.47 ± 0.08 |
build: a0e13dcb (6470)
Minimum Delivery Charge (what’s paid monthly, which is largely irrelevant, before annual true-up of NEM charges): $11.69/month
Actual charges, billed annually, per kWh:
Peak NEM charge: $.62277
Off-Peak NEM charges: $.31026
Plus 3-20% extra (depending on the month) in “non-bypassable charges” (I haven’t figured out where these numbers come from), then a 7.5% local utility tax.Those rates do get a little lower in the winter (.30 to .48), and of course the very high rates benefit me when I generate more energy than I consume (which only happens when I’m on vacation). But the marginal all-in costs are just very high.
That’s NEM2 + TOU-EV2A, specifically.
Not sure if it's not properly doing lower power states, or if it's the 10 HDDs spinning. Or even the GPU. But also don't really have anything important running on it that I can't just turn it off.
It's recommended for Pi 5, and if you're running a Pi 4, you should at least use a little heat sink, the 4 and 5 run pretty warm, and under any load they can throttle quite easily. I run mine in a rack, in the UK where it's not very warm compared to other parts of the world, and they get pretty warm even with cooling.
> Also the other peripherals you consider are irrelevant, since you would need them (or not), in other setups
No, they're not irrelevant, because if you buy a Mini-PC you get SSD, RAM, cooling, case, PSU included in the price.
> You can use a pi without a PSU for example
You can wing it with some odd USB charger you have lying around, but my experience over a decade killing tens of high-quality microSDs in Pis, power throttling and brown outs is that you should stick to the Pi spec (5.1V) PSUs, the current can typically be lower than their rated if you're not connecting peripherals but a proper USB spec plug will be 5V not the 5.1V the Pi wants.
> Reread my post? I meant specifically that Pis are great for the 1 to 2 range
I think you need to re-read mine, I'm not suggesting replacing all of the Pis with a mini-PC, I'm suggesting replacing ONE is cost-effective NOW, when compared to Pi 5.
> So I'm saying they are good at the 100$-200$ budget
Disagree (at least as things stand here in the UK with our current pricing).
Mini-PC with N100, 16GB RAM, 512GB SSD, case, cooling, PSU, better IO, much better performance, etc: £128[0]
Pi 5, bare board, nothing else: £114[1]
These aren't some obtuse websites, they're places I shop all the time, PiHut is an official distributor in the UK, and the Amazon result is the second result for "mini pc".
The thing about the performance gap here is that you _can_ replace 2-3+ Raspberry Pis with a single Mini-PC for the same price as a single Raspberry Pi 5. I've occasionally seen mini PC models on Amazon go on sale for £99 and less.
I'm not talking theoretical or napkin maths, I've literally done it, I replaced a bunch of Pis with a mini PC and now the Pis sit idle because there's still LOTS of headroom on the mini PC to add more, before I need to even consider firing up the Pis again for other stuff.
The Pi, _to me_, in 2025, is a great tool for learning, and building upon, using the GPIO and the excellent resources, but for self-hosting services, it no longer adds up.
By services I mean software tools, services, things actively "doing work", not a personal blog or project that could run on a vape[2].
[0] https://www.amazon.co.uk/BOSGAME-Computers-Windows-Desktop-G... [1] https://thepihut.com/products/raspberry-pi-5?src=raspberrypi... [2] https://news.ycombinator.com/item?id=45252817
You know in k8s you've got worker nodes and control plane nodes? The control planes don't need much horsepower, but they're what you need to be online all to communicate with the cluster. Pis work just fine for that.
For dedicated build boxes that crunch through lots of sources (whole distributions, AOSP) but do run seldomly, getting your hands on lots of Cores and RAM very cheaply can still trump buying newer CPUs with better perf/watt but higher cost.
Have you looked at what they cost? Those cases alone cost as much as a used server. Which comes with a case.
> What about PCI Express card? Regular ATX computers are expandable.
As mentioned higher up, they run out of lane count in a hurry. Especially when you're using things like used Connect-X cards
You also don't need RPis to learn anything about programming, networking, electronics, etc.
But people do it anyways.
I really don't see what point anyone thinks they are making regarding pedogogy. RPis are synonymous with tinkering, regardless of how you cut it. Distributed systems too.
Again, lots of variables there and it really depends on how heavily you intend to use/rely on that sandbox as to what's the better play. Regional pricing also comes into it.
I remember back in the R710 days (circa 2008 and Nehalem/Westmere cpu's) that under like 30% cpu load, most of your power draw came from fans that you couldn't spin down below a certain threshold without an firmware/idrac script, as well as what you mentioned about those PSU's being optimized for high sustained loads and thus being inefficient at near idle and low usage.
IIRC System Idle power profile on those was only like 15% CPU (that's combined for both CPUs), with the rest being fans, ram and the various other vendor stuff (iDrac, PERC etc) and low-load PSU inefficiencies.
Newer hardware has gotten better, but servers are still generally engineered for above 50% sustained loads rather than under, and those fans still can easily pull a dozen plus watts even at very low usage each in those servers (of course, depends on exact model), so, point being, splitting hairs over a dozen watts or so between CPU's is a bit silly when your power floor from fans and PSU inefficiencies alone puts you at 80W+ draw anyway, not to mention the other components (NIC, Drives, Storage controller, OoB, RAM etc). Also, this is primarily relevant for surplus servers, but lot of people building systems at home for the usecase relevant to this discussion often turn to or are recommended these servers, so just wanted to add this food for thought.
I can solve for them with three equations for three unknowns... but since they change the rates quarterly by the time I know what my exact rates were they have changed.
1) raspberry pis competitors have gotten better, that nuc is very cheap.
2) the pi has gone in a different direction, increasing specs and price, the 3b+ or 4a had much lower specs, price, power consumption etc...
In conclusion, if you can get an arm soc board with specs similar to the 3b+ or 4a (500mb to 2gb ram), then you can host a blog on linux for cheap. Should run you in the 50$ area. But raspberry no longer makes these, you might look into the thousands of competitors.
Additionally if you want something more serious, nucs become reasonable, while it's hard to tell whether two 50$ pis or one 200$ Intel NUC would be better. It depends on the tradeoffs.
I mean: An ATX case can be paid for once, and then be used for decades. (I'm writing this using a modern desktop computer with an ATX case that I bought in 2008.)
PCI Express lanes can be multiplied. There should frankly be more of this going on than there is, but it's still a thing that can be done.
Consumer boards built on the AMD X670E chipset, for instance, have some switching magic built in. There's enough direct CPU-connected lanes for an x16 GPU and a couple of x4 NVMe drives, and the NIC(s) and/or HBA(s) can go downstream of the chipset.
(Yeah, sure: It's limited to an aggregate 64 Gbps at the tail end, but that's not a problem for the things I do at home where my sights are set on 10Gbps networking and an HBA with a bunch of spinny disks. Your needs may differ.)
If I spill something on my own hardware, the max out-of-pocket amount I lose is the amount I spent on that hardware.
If I run up an AWS/GCP/Azure bill accidentally... the max out-of-pocket amount I lose is often literally unbounded. Are there some guardrails you can put around this? Sure. But they're often confusing, misleading, delayed, or riddled with "holes" which they don't catch.
Ex - the literal best AWS offers you is delayed "billing alarms" which need to be manually enabled and configured, and even then don't cover all the services you might incur billing charges for.
It's not that "Oopsies" can't happen locally - it's that even if they do, I have a clear understanding of the potential costs by default, and they're much less intangible than "I left a thing running overnight and I now I owe AWS a new car worth of cash".
The worst case for a misconfigured bit of software locally is that my machine stalls and my services go down (ex - overloaded). The worst case for a misconfigured bit of software in AWS is literal bankruptcy.
Think about that for a minute.
The issue with competing ARM SBCs is the software support; Radxa makes some boards that are more powerful than Pis, but if you read the forums they've had hardware flaws in the designs, and they run old kernels and don't get updated, and of course there isn't the community behind it.
An x86 mini pc is a different beast to a Pi, but then I think a lot of people who were hosting software on a Pi weren't specifically looking for ARM architecture anyway, unless they were, in which case stick with a Pi.
In this case that 'new' is energy efficient software down to the individual lines of code and what their energy cost is on certain hardware. Academics are publishing about it in niche corners of the web and some entrepreneurs are doing it but of course none of this is cool now so we remain a mockery for our objectives. In time this too will become a real thing as many now are just beginning to feel the ever rising costs of energy which is only just starting to increase from decisions made years ago. The worst is yet to come as seen and heard directly from every single expert that has testified in the last years before the Energy and Commerce committee however only the outside-the-boxers among us watch such educational content to better prepare for tomorrow.
Electricity powers our world and nearly all take it for granted, time too will change this thinking.
:D
I'm well aware of the costs of power and the lgostics of colocation, this is purely about how I'm more willing to spend $100-$200 for a toy than I am $1000-$2000.
[1] https://en.wikipedia.org/wiki/Shannon_hydroelectric_scheme