Pixar's Render Farm

(twitter.com)

382 points brundolf | 1 comments | 02 Jan 21 19:56 UTC | HN request time: 0.201s | source

Show context

nom ◴[02 Jan 21 20:39 UTC] No.25616292[source]▶

Oh man, I wanted this to contain much more details :(

Whats the hardware? How much electric energy goes into rendering a frame or a whole movie? How do they provision it (as they keep #cores fixed)? They only talk about cores, do they even use GPUs? What's running on the machines? What did they optimize lately?

So many questions! Maybe someone from Pixar's systems department is reading this :)?

replies(7): >>25616619 #>>25616668 #>>25616803 #>>25616962 #>>25617126 #>>25617551 #>>25622359 #

aprdm ◴[02 Jan 21 21:23 UTC] No.25616668[source]▶

>>25616292 #

Not Pixar specifically but Modern VFX and Animation studios usually have a bare metal render farm, they usually are pretty beefy -- think at least 24 cores / 128 GB of RAM per node.

Usually in crunch time if there's not enough nodes in the render farm they might rent nodes connecting them to their network for a period of time, or they might use the cloud, or they might get budget to increase their render farms.

From what I've seen the Cloud is extremely expensive for beefy machines with GPUs, but, you can see that some companies use it if you google [0] [1].

GPUs can be used for some workflows in modern studios but I would bet the majority of it is CPUs, those machines are usually running a Linux distro and the render processes (like vray / prman , etc.). Everything runs from a big NFS cluster.

[0] https://deadline.com/2020/09/weta-digital-pacts-with-amazon-...

[1] https://www.itnews.com.au/news/dreamworks-animation-steps-to...

replies(1): >>25617317 #

tinco ◴[02 Jan 21 22:44 UTC] No.25617317[source]▶

>>25616668 #

Can confirm cloud GPU is way overpriced if you're doing 24/7 rendering. We run a bare metal cluster (not VFX but photogrammetry) and I pitched our board on the possibilities. I really did not want to run a bare metal cluster, but it just does not make sense for a low margin startup to use cloud processing.

Running 24/7 for three months, it's cheaper to buy consumer grade hardware with similar (probably better) performance. "Industrial" grade hardware (Xeon/Epyc + Quadro) it's under 12 months. We chose consumer grade bare metal.

On thing that was half surprising, half calculated in our decision was despite the operational overhead how much less stressful running your own hardware is. When we ran experimentally on the cloud, a misrender could cost us 900 euro, and sometimes we'd have to render 3 times or more for a single client. Bringing us from healthily profitable to losing money. The stress of having to get it right the first time sucked.

replies(3): >>25617371 #>>25620467 #>>25621281 #

malthejorgensen ◴[03 Jan 21 08:31 UTC] No.25620467[source]▶

>>25617317 #

How do you manage the bare metal cluster? (E.g. apt/yum updates but also networking and such)

replies(3): >>25620727 #>>25621636 #>>25622256 #

1. tinco ◴[03 Jan 21 13:25 UTC] No.25621636[source]▶

>>25620467 #

When it was 3 nodes, and then 6 nodes, the answer was very unprofessionally. I didn't get the budget for a system administrator, and I spent all my budget on developers that could build our application and automate our preprocessing, overlooked system administration skills. So besides the DoE, managing 3 small teams and being the lead developer, I also am the system administrator.

So no fancy answer, our 3D experts got TeamViewer access to the nodes running Windows Pro. Sometimes our renders fail on patch Tuesday because I forgot to reapply the no-reboot hack.

We're professionalizing now at 12 nodes, we got to the point where the 3D experts don't need to TeamViewer in, so we're swapping them to headless Linux. No idea on the update management yet, but they're clean nodes running Ubuntu server.

↑