Pixar's Render Farm

(twitter.com)

382 points brundolf | 4 comments | 02 Jan 21 19:56 UTC | HN request time: 0.635s | source

Show context

mmcconnell1618 ◴[02 Jan 21 20:48 UTC] No.25616372[source]▶

Can anyone comment on why Pixar uses standard CPU for processing instead of custom hardware or GPU? I'm wondering why they haven't invested in FPGA or completely custom silicon that speeds up common operations by an order of magnitude. Is each show that different that no common operations are targets for hardware optimization?

replies(12): >>25616493 #>>25616494 #>>25616509 #>>25616527 #>>25616546 #>>25616623 #>>25616626 #>>25616670 #>>25616851 #>>25616986 #>>25617019 #>>25636451 #

berkut ◴[02 Jan 21 21:06 UTC] No.25616527[source]▶

>>25616372 #

Because the expense is not really worth it - even GPU rendering (while around 3/4 x faster than CPU rendering) is memory constrained compared to CPU rendering, and as soon as you try and go out-of-core on the GPU, you're back at CPU speeds, so there's usually no point doing GPU rendering for entire scenes (which can take > 48 GB of RAM for all geometry, accel structures, textures, etc) given the often large memory requirements.

High end VFX/CG usually tessellates geometry down to micropolygon, so you roughly have 1 quad (or two triangles) per pixel in terms of geometry density, so you can often have > 150,000,000 polys in a scene, along with per vertex primvars to control shading, and many textures (which can be paged fairly well with shade on hit).

Using ray tracing pretty much means having all that in memory at once (paging sucks in general of geo and accel structures, it's been tried in the past) so that intersection / traversal is fast.

Doing lookdev on individual assets (i.e. turntables) is one place where GPU rendering can be used as the memory requirements are much smaller, but only if the look you get is identical to the one you get using CPU rendering, which isn't always the case (some of the algorithms are hard to get working correctly on GPUs, i.e. volumetrics).

Renderman (the renderer Pixar use, and create in Seattle) isn't really GPU ready yet (they're attempting to release XPU this year I think).

replies(4): >>25616832 #>>25617017 #>>25617606 #>>25620652 #

dahart ◴[02 Jan 21 22:07 UTC] No.25617017[source]▶

>>25616527 #

> Because the expense is not really worth it

I disagree with this takeaway. But full disclosure I’m biased: I work on OptiX. There is a reason Pixar and Arnold and Vray and most other major industry renderers are moving to the GPU, because the trends are clear and because it has recently become ‘worth it’. Many renderers are reporting factors of 2-10 for production scale scene rendering. (Here’s a good example: https://www.youtube.com/watch?v=ZlmRuR5MKmU) There definitely are tradeoffs, and you’ve accurately pointed out several of them - memory constraints, paging, micropolygons, etc. Yes, it does take a lot of engineering to make the best use of the GPU, but the scale of scenes in production with GPUs today is already firmly well past being limited to turntables, and the writing is on the wall - the trend is clearly moving toward GPU farms.

replies(4): >>25617080 #>>25617265 #>>25619363 #>>25622440 #

boulos ◴[03 Jan 21 04:12 UTC] No.25619363[source]▶

>>25617017 #

Dave, doesn’t that video show more like “50% faster”? Here’s the timecode (&t=360) [1] for the “production difficulty” result (which really doesn’t seem to be, but whatever).

Isn’t there a better Vray or Arnold comparison somewhere?

As in my summary comment, an A100 can now run real scenes, but will cost you ~$10k per card. For $10k, you get a lot more threads from AMD.

[1] https://m.youtube.com/watch?v=ZlmRuR5MKmU&t=360

replies(2): >>25619798 #>>25620125 #

1. shaklee3 ◴[03 Jan 21 05:44 UTC] No.25619798[source]▶

>>25619363 #

What do you mean by a lot more threads? Are you comparing an epyc?

replies(1): >>25620082 #

2. boulos ◴[03 Jan 21 06:51 UTC] No.25620082[source]▶

>>25619798 (TP) #

Yeah, that came off clumsily (I’d lost part of my comment while switching tabs on my phone).

An AMD Rome/Milan part will give you 256 decent threads on a 2S box with a ton of RAM for say $20-25k at list price (e.g., a Dell power edge without any of their premium support or lots of flash). By comparison, the list price of just an A100 is $15k (and you still need a server to drive the thing).

So for shops shoving these into a data center they still need to do a cost/benefit tradeoff of “how much faster is this for our shows, can anyone else make use of it, how much power do these draw...”. If anything, the note about more and more software using CUDA is probably as important as “ray tracing is now sufficiently faster” since the lack of reuse has held them back (similar things for video encoding historically: if you’ve got a lot of cpus around, it was historically hard to beat for $/transcode).

replies(2): >>25620996 #>>25621682 #

3. erosenbe0 ◴[03 Jan 21 10:48 UTC] No.25620996[source]▶

>>25620082 #

Is the high priced 256 thread part that interesting for rendering? You can get 4 of the 64 thread parts on separate boards and each one will have its own 8 channel ddr instead of having to share that bandwidth. Total performance will be higher for less or same money. Power budget will be higher but only a couple dollars a day, at most. But I haven't been involved in a cluster for some time, so not really sure what is done these days.

4. shaklee3 ◴[03 Jan 21 13:38 UTC] No.25621682[source]▶

>>25620082 #

The reason I asked is I did a performance trade-off with a v100 and dual epyc rome with 64 cores, and the v100 won handily for my tasks. That obviously won't always be the case, but in terms of threads you're now comparing 256 to 5000+, but obviously not apples to apples.

↑