←back to thread

Pixar's Render Farm

(twitter.com)
382 points brundolf | 3 comments | | HN request time: 0s | source
Show context
thomashabets2 ◴[] No.25616274[source]
I'm surprised they hit only 80-90% CPU utilization. Sure, I don't know their bottlenecks, but I understood this to be way more parallelizable than that.

I ray trace quake demos for fun at a much much lower scale[0], and have professionally organized much bigger installs (I feel confident in saying even though I don't know Pixar's exact scale).

But I don't know state of the art rendering. I'm sure Pixar knows their workload much better than I do. I would be interested in hearing why, though.

[0] Youtube butchers the quality in compression, but https://youtu.be/0xR1ZoGhfhc . Live system at https://qpov.retrofitta.se/, code at https://github.com/ThomasHabets/qpov.

Edit: I see people are following the links. What a day to overflow Go's 64bit counter for time durations on the stats page. https://qpov.retrofitta.se/stats

I'll fix it later.

replies(5): >>25616362 #>>25616369 #>>25616380 #>>25616401 #>>25617648 #
KaiserPro ◴[] No.25617648[source]
maxing a CPU is easy, keeping it fed with data, and being able to save that data out is hard.
replies(1): >>25617691 #
1. thomashabets2 ◴[] No.25617691[source]
Yes, but the work units (frames) are large enough that I'm still surprised.

Maybe they're not as parallelizable as I'd expect. E.g. if there's serial work to be done by reusing scene layout algorithms between frames.

replies(1): >>25617738 #
2. KaiserPro ◴[] No.25617738[source]
A scene will have many thousands of assets (trees, cars, people, etc) each one will have the geo, which could be in the milllions of polygons (although they use sub-ds)

each "polygon" could have a 16k texture on it. You're pulling TBs of textures and other assets in each frame.

replies(1): >>25623662 #
3. thomashabets2 ◴[] No.25623662[source]
Hmm, yes I see. TBs? Interesting. I'd like to hear a talk about these things.

Naively I would expect that (as is the case for my MUCH smaller scale system) that I can compensate for network/disk-bound and non-multithreaded stages by merely running two concurrent frames.

On a larger scale I would expect to be able to estimate RAM-cheap frames, and always have one of them running per machine, but at SCHED_IDLE priority, so that they only get CPU when the "main" frame is blocked on disk or network, or a non-parallelizable stage. By starving one frame of CPU, it's much more likely that it'll need CPU the short intervals when it's allowed to get it.