Why do random forests work? They are self-regularizing adaptive smoothers

Good to see more research exploring the connection between trees, ensembles, and smoothing. Way back in Trevor Hastie's ESL book there's a section on how gradient boosting using "stumps" (trees with only one split) is equivalent to an additive spline model (GAM, technically) with a step function as a spline basis and adaptive knot placement.

I've always thought there should be a deep connection between ReLU neural nets and regularized adaptive smoothers as well, since the ReLU function is itself a spline basis (a so-called truncated linear spline) and happens to span the same functional basis as B-splines of the same degree.