Most active commenters

berkut(5)
dahart(3)
boulos(3)
shaklee3(3)
erosenbe0(3)

Popular/hot comments

>>25617080 #

←back to thread

Pixar's Render Farm

(twitter.com)

Show context

mmcconnell1618 ◴[02 Jan 21 20:48 UTC] No.25616372[source]▶

>>25615888 (OP) #

Can anyone comment on why Pixar uses standard CPU for processing instead of custom hardware or GPU? I'm wondering why they haven't invested in FPGA or completely custom silicon that speeds up common operations by an order of magnitude. Is each show that different that no common operations are targets for hardware optimization?

replies(12): >>25616493 #>>25616494 #>>25616509 #>>25616527 #>>25616546 #>>25616623 #>>25616626 #>>25616670 #>>25616851 #>>25616986 #>>25617019 #>>25636451 #

berkut ◴[02 Jan 21 21:06 UTC] No.25616527[source]▶

>>25616372 #

Because the expense is not really worth it - even GPU rendering (while around 3/4 x faster than CPU rendering) is memory constrained compared to CPU rendering, and as soon as you try and go out-of-core on the GPU, you're back at CPU speeds, so there's usually no point doing GPU rendering for entire scenes (which can take > 48 GB of RAM for all geometry, accel structures, textures, etc) given the often large memory requirements.

High end VFX/CG usually tessellates geometry down to micropolygon, so you roughly have 1 quad (or two triangles) per pixel in terms of geometry density, so you can often have > 150,000,000 polys in a scene, along with per vertex primvars to control shading, and many textures (which can be paged fairly well with shade on hit).

Using ray tracing pretty much means having all that in memory at once (paging sucks in general of geo and accel structures, it's been tried in the past) so that intersection / traversal is fast.

Doing lookdev on individual assets (i.e. turntables) is one place where GPU rendering can be used as the memory requirements are much smaller, but only if the look you get is identical to the one you get using CPU rendering, which isn't always the case (some of the algorithms are hard to get working correctly on GPUs, i.e. volumetrics).

Renderman (the renderer Pixar use, and create in Seattle) isn't really GPU ready yet (they're attempting to release XPU this year I think).

replies(4): >>25616832 #>>25617017 #>>25617606 #>>25620652 #

1. dahart ◴[02 Jan 21 22:07 UTC] No.25617017[source]▶

>>25616527 #

> Because the expense is not really worth it

I disagree with this takeaway. But full disclosure I’m biased: I work on OptiX. There is a reason Pixar and Arnold and Vray and most other major industry renderers are moving to the GPU, because the trends are clear and because it has recently become ‘worth it’. Many renderers are reporting factors of 2-10 for production scale scene rendering. (Here’s a good example: https://www.youtube.com/watch?v=ZlmRuR5MKmU) There definitely are tradeoffs, and you’ve accurately pointed out several of them - memory constraints, paging, micropolygons, etc. Yes, it does take a lot of engineering to make the best use of the GPU, but the scale of scenes in production with GPUs today is already firmly well past being limited to turntables, and the writing is on the wall - the trend is clearly moving toward GPU farms.

replies(4): >>25617080 #>>25617265 #>>25619363 #>>25622440 #

2. berkut ◴[02 Jan 21 22:15 UTC] No.25617080[source]▶

>>25617017 (TP) #

I write a production renderer for a living :)

So I'm well aware of the trade offs. As I mentioned, for lookdev and small scenes, GPUs do make sense currently (if you're willing the pay the penalty of getting code to work on both CPU and GPU, and GPU dev is not exactly trivial in terms of debugging / building compared to CPU dev).

But until GPUs exist with > 64 GB RAM, for rendering large scale scenes, it's just not worth it given the extra burdens (increased development costs, heterogeneous sets of machines in the farm, extra debugging, support), so for high-end scale, we're likely 3/4 years away yet.

replies(4): >>25617276 #>>25617360 #>>25619796 #>>25620580 #

3. berkut ◴[02 Jan 21 22:37 UTC] No.25617265[source]▶

>>25617017 (TP) #

I should also point out that ray traversal / intersection costs are generally only around 40% of the costs of extremely large scenes, and that's predominantly where GPUs are currently much faster than CPUs.

(I'm aware of the OSL batching/GPU work that's taking place, but it remains to be seen how well that's going to work).

From what I've heard from friends in the industry (at other companies) who are using GPU versions of Arnold, the numbers are no-where near as good as the upper numbers you're claiming when rendering at final fidelity (i.e. with AOVs and Deep output), so again, the use-cases - at least for high-end VFX with GPU - are still mostly for lookdev and lighting blocking iterative workflow from what I understand. Which is still an advantage and provides clear benefits in terms of iteration time over CPU renderers, but it's not a complete win, and so far, only the smaller studios have started dipping their toes in the water.

Also, the advent of AMD Epyc has finally thrown some competitiveness back to CPU rendering, so it's now possible to get a machine with x2 as many cores for close to half the price, which has given CPU rendering a further shot in the arm.

4. foota ◴[02 Jan 21 22:38 UTC] No.25617276[source]▶

>>25617080 #

Given current consumer GPUs are at 24 GB I think 3-4 years is likely overly pessimistic.

replies(1): >>25617327 #

5. berkut ◴[02 Jan 21 22:44 UTC] No.25617327{3}[source]▶

>>25617276 #

They've been at 24 GB for two years though - and they cost an arm and a leg compared to a CPU with a similar amount.

It's not just about them existing, they need to be cost effective.

replies(2): >>25617419 #>>25620838 #

6. dahart ◴[02 Jan 21 22:48 UTC] No.25617360[source]▶

>>25617080 #

I used to write a production renderer for a living, now I work with a lot of people who write production renderers for both CPU and GPU. I’m not sure what line you’re drawing exactly ... if you mean that it will take 3 or 4 years before the industry will be able to stop using CPUs for production rendering, then I totally agree with you. If you mean that it will take 3 or 4 years before industry can use GPUs for any production rendering, then that statement would be about 8 years too late. I’m pretty sure that’s not what you meant, so it’s somewhere in between there, meaning some scenes are doable on the GPU today and some aren’t. It’s worth it now in some cases, and not worth it in other cases.

The trend is pretty clear, though. The size of scenes than can be done on the GPU today is large and growing fast, both because of improving engineering and because of increasing GPU memory speed & size. It’s just a fact that a lot of commercial work is already done on the GPU, and that most serious commercial renderers already support GPU rendering.

It’s fair to point out that the largest production scenes are still difficult and will remain so for a while. There are decent examples out there of what’s being done in production with GPUs already:

https://www.chaosgroup.com/vray-gpu#showcase

https://www.redshift3d.com/gallery

https://www.arnoldrenderer.com/gallery/

replies(1): >>25617689 #

7. lhoff ◴[02 Jan 21 22:55 UTC] No.25617419{4}[source]▶

>>25617327 #

Not anymore. The new Ampere based Quadros and Teslas just launched with up to 48GB of RAM. A special datacenter version with 80Gb is also already announced: https://www.nvidia.com/en-us/data-center/a100/

They are really expensive though. But chassis and rackspace also isn't free. If one beefy node with a couple GPUs can replace have a rack of CPU only Nodes the GPUs are totally worth it.

I'm not too familiar with 3D rendering but in other workloads the GPU speedup is so huge that if its possible to offload to the GPU it made sense to do it from a economical perspective.

replies(1): >>25620882 #

8. berkut ◴[02 Jan 21 23:29 UTC] No.25617689{3}[source]▶

>>25617360 #

The line I'm drawing is high-end VFX / CG is still IMO years away from using GPUs for final frame (with loads of AOVs and Deep output) rendering.

Are GPUs starting to be used at earlier points in the pipeline? Yes, absolutely, but they always were to a degree in previs and modelling (via rasterisation). They are gradually becoming more useable at more steps in pipelines, but they're not there yet for high-end studios.

In some cases, if a studio's happy using an off-the-shelf renderer with the stock shaders (so no custom shaders at all - at least until OSL is doing batching and GPU stuff, or until MDL actually supports production renderer stuff) studios can use GPUs further down the pipeline, and currently that's smaller scale stuff from what I gather talking to friends who are using Arnold GPU. Certainly the hero-level stuff at Weta / ILM / Framestore isn't being done with with GPUs, as they require custom shaders, and they aren't going to be happy with just using the stock shaders (which are much better than stock shaders from 6/7 years ago, but still far from bleeding edge in terms of BSDFs and patterns).

Even from what I hear at Pixar with their lookdev Flow renderer things aren't completely rosy on the GPU front, although it is at least getting some use, and the expectation is XPU will take over there, but I don't think it's quite ready yet.

Until a studio feels GPU rendering can be used for a significant amount of the renders (that they do, for smaller studios, the fidelity will be less, so the threshold will be lower for them), I think it's going to be a chicken-and-egg problem of not wanting to invest in GPUs on the farms (or even local workstations).

replies(1): >>25619391 #

9. boulos ◴[03 Jan 21 04:12 UTC] No.25619363[source]▶

>>25617017 (TP) #

Dave, doesn’t that video show more like “50% faster”? Here’s the timecode (&t=360) [1] for the “production difficulty” result (which really doesn’t seem to be, but whatever).

Isn’t there a better Vray or Arnold comparison somewhere?

As in my summary comment, an A100 can now run real scenes, but will cost you ~$10k per card. For $10k, you get a lot more threads from AMD.

[1] https://m.youtube.com/watch?v=ZlmRuR5MKmU&t=360

replies(2): >>25619798 #>>25620125 #

10. boulos ◴[03 Jan 21 04:19 UTC] No.25619391{4}[source]▶

>>25617689 #

I think you’re right about the current state (not quite there, especially in raw $$s), but the potential is finally good enough that folks are investing seriously on the software side.

The folks at Framestore and many other shops already don’t do more than XX GiB per frame for their rendering. So for me, this comes down to “can we finally implement a good enough texture cache in optix/the community” which I understand Mark Leone is working on :).

The shader thing seems easy enough. I’m not worried about an OSL compiled output running worse than the C-side. Divergence is a real issue, but so many studios are now using just a handful of BSDFs with lots of textures to drive, that as long as you don’t force the shading to be “per object group” but instead “per shader, varying inputs is fine”, you’ll still get high utilization.

The 80 GiB parts will make it so that some shops could go fully in-core. I expect we’ll see that sooner than you’d think, just because people will start doing interactive work, never want to give it up, and then say “make that but better” for the finals.

11. shaklee3 ◴[03 Jan 21 05:42 UTC] No.25619796[source]▶

>>25617080 #

GPUs do exist with 64+GB of rao, virtually. A dgx2 has distributed memory where you can see the entire 16x32GB of address space backed by nvlink. And that technology is now 3 years old, and it's even higher now.

12. shaklee3 ◴[03 Jan 21 05:44 UTC] No.25619798[source]▶

>>25619363 #

What do you mean by a lot more threads? Are you comparing an epyc?

replies(1): >>25620082 #

13. boulos ◴[03 Jan 21 06:51 UTC] No.25620082{3}[source]▶

>>25619798 #

Yeah, that came off clumsily (I’d lost part of my comment while switching tabs on my phone).

An AMD Rome/Milan part will give you 256 decent threads on a 2S box with a ton of RAM for say $20-25k at list price (e.g., a Dell power edge without any of their premium support or lots of flash). By comparison, the list price of just an A100 is $15k (and you still need a server to drive the thing).

So for shops shoving these into a data center they still need to do a cost/benefit tradeoff of “how much faster is this for our shows, can anyone else make use of it, how much power do these draw...”. If anything, the note about more and more software using CUDA is probably as important as “ray tracing is now sufficiently faster” since the lack of reuse has held them back (similar things for video encoding historically: if you’ve got a lot of cpus around, it was historically hard to beat for $/transcode).

replies(2): >>25620996 #>>25621682 #

14. dahart ◴[03 Jan 21 07:02 UTC] No.25620125[source]▶

>>25619363 #

Yes, this example isn’t quite as high as the 2-10x range I claimed, but I still liked it as an example because the CPU is very beefy, and it’s newer and roughly the same list price as the GPU being compared. I like that they compare power consumption too, and ultimately the GPU comes out well ahead. There are lots of other comparisons that show huge x-factors, this one seemed less likely to get called out for cherry picking, and @berkut’s critique of texture memory consumption for large production scenes is fair... we’re not all the way there yet. But, 50% faster is still “worth it”. In the video, Sam mentions that if you compare lower end components on both sides, the x-factor will be higher.

15. fluffy87 ◴[03 Jan 21 08:57 UTC] No.25620580[source]▶

>>25617080 #

There are already GPUs with >90GB RAM? DGX-A100 has a version with 16 A100 GPUs, having each 90 Gb.. that’s 1.4TB of GPU memory on a single node.

16. erosenbe0 ◴[03 Jan 21 10:08 UTC] No.25620838{4}[source]▶

>>25617327 #

Desktop GPUs could have 64GB of GDDR right now but the memory bus width to drive those bits optimally (in primary use case of real-time game rendering, not offline) would up the power and heat dissipation requirements beyond what is currently engineered onto a [desktop] PCIE card.

If 8k gaming becomes a real thing you can expect work to be done towards a solution, but until then not so much.

Edit: added [desktop] preceding PCIE

17. erosenbe0 ◴[03 Jan 21 10:18 UTC] No.25620882{5}[source]▶

>>25617419 #

Hashing and linear algebra kernels get much more speedup on a GPU than a vfx pipeline does. But I am glad to see reports here detailing that the optimization of vfx is progressing.

18. erosenbe0 ◴[03 Jan 21 10:48 UTC] No.25620996{4}[source]▶

>>25620082 #

Is the high priced 256 thread part that interesting for rendering? You can get 4 of the 64 thread parts on separate boards and each one will have its own 8 channel ddr instead of having to share that bandwidth. Total performance will be higher for less or same money. Power budget will be higher but only a couple dollars a day, at most. But I haven't been involved in a cluster for some time, so not really sure what is done these days.

19. shaklee3 ◴[03 Jan 21 13:38 UTC] No.25621682{4}[source]▶

>>25620082 #

The reason I asked is I did a performance trade-off with a v100 and dual epyc rome with 64 cores, and the v100 won handily for my tasks. That obviously won't always be the case, but in terms of threads you're now comparing 256 to 5000+, but obviously not apples to apples.

20. Narann ◴[03 Jan 21 15:54 UTC] No.25622440[source]▶

>>25617017 (TP) #

> There is a reason Pixar and Arnold and Vray and most other major industry renderers are moving to the GPU

The reason is that those softwares need to be sold to many, and a big part of studios are doing advertise and series. GPU rendering is perfect for them as they don't need/can't afford large scale render farms.

About your example, that not honest. It's full of instances and perfect use case for a "Wow" effect but it's not a production shot. Doing a production shot required complexity management on the long run, even for CPU rendering. On this side, GPU is more "constrained" than CPU, management is even more complex.

↑