←back to thread

Pixar's Render Farm

(twitter.com)
382 points brundolf | 1 comments | | HN request time: 0.204s | source
Show context
blaisio ◴[] No.25616657[source]
In case someone is curious, the optimization they describe is trivial to do in Kubernetes - just enter a resource request for cpu without adding a limit.
replies(2): >>25616755 #>>25617727 #
1. KaiserPro ◴[] No.25617727[source]
kubernetes can't do it at this scale "trivially"

Firstly, K8s has no concept of licenses, it also is exceptionally weak on dependencies. A job graph for a VFX job can be well over 100k nodes, something that would crash k8s.

Secondly, tractor (https://rmanwiki.pixar.com/display/TRA/Tractor+2) is exceptionally fast at dispatching jobs to machines. I suspect its in the order of 50k a second, if not more.

Thirdly, getting k8s to talk to 25k machines without saturating the network is almost impossible.

fourthly, it doesn't do to well on "normal" network, try getting decent network throughput on one of K8s batshit networking schemes(each server on a farm will have at a minimum 2 10 gig links, more likley 2 40gig)