I regret building this $3000 Pi AI cluster

(www.jeffgeerling.com)

468 points speckx | 3 comments | 19 Sep 25 14:28 UTC | HN request time: 0s | source

Show context

densh ◴[19 Sep 25 18:08 UTC] No.45304632[source]▶

>>45302065 (OP) #

For anyone interested in playing with distributed systems, I'd really recommend getting a single machine with latest 16-core CPU from AMD and just running 8 virtual machines on it. 8 virtual machines, with 4 hyper threads pinned per machine, and 1/8 of total RAM per machine. Create a network between them virtually within your virtualization software of choice (such as Proxmox).

And suddenly you can start playing with distributed software, even though it's running on a single machine. For resiliency tests you can unplug one machine at a time with a single click. It will annihilate a Pi cluster in Perf/W as well, and you don't have to assemble a complex web of components to make it work. Just a single CPU, motherboard, m.2 SSD, and two sticks of RAM.

Naturally, using a high core count machine without virtualization will get you best overall Perf/W in most benchmarks. What's also important but often not highlighted in benchmarks in Idle W if you'd like to keep your cluster running, and only use it occasionally.

replies(6): >>45305155 #>>45305387 #>>45305468 #>>45305628 #>>45307364 #>>45313651 #

bee_rider ◴[19 Sep 25 19:00 UTC] No.45305155[source]▶

>>45304632 #

Tangentially related: I really expected running old MPI programs on stuff like the AMD multi-chip workstation packages to become a bigger thing.

replies(1): >>45309479 #

1. le-mark ◴[20 Sep 25 02:07 UTC] No.45309479[source]▶

>>45305155 #

I actually worked with some MPI code way back. What MPI programs are you referring to?

replies(2): >>45310410 #>>45311020 #

2. MathMonkeyMan ◴[20 Sep 25 04:34 UTC] No.45310410[source]▶

>>45309479 (TP) #

I don't know, but when I was playing with finite difference code as an undergrad in Physics, all of the docs I could find (it was a while ago, though) assumed that I was going to use MPI to run a distributed workload across the university's supercomputer. My needs were less, so I just ran my Boost.Thread code on the four cores of one node.

What if you had a single server with a zillion cores in it? Maybe you could take some 15 year old MPI code and run it locally -- it'd be like a mini supercomputer with an impossibly fast network.

3. bee_rider ◴[20 Sep 25 06:41 UTC] No.45311020[source]▶

>>45309479 (TP) #

I’m not thinking of one code in particular. Just, observing that in the multi-chiplet, even inside a CPU package we’re already talking over a sort of little internal network anyway. Might as well use code that was designed to run on a network, right?

↑