←back to thread

291 points sebg | 1 comments | | HN request time: 0.205s | source
Show context
youoy ◴[] No.41890415[source]
Nice article, although I find it's a bit overly complex if your are not familiar with ML and mathematics at the same time.

I will leave here a geometrical intuition of why they work in case it can help someone:

To simplify things, I will talk about regression, and say we want to predict some value y, that we know depends on x, and we have some noisy measurements of y.

y = f(x) y_observed = f(x) + noise

If you want some mental image, think about f(x)=sin(x).

Now, a (over fitted) regression tree in this case is just a step function where the value at x is y_observed. If there is no noise, we now that by doing more measurements, we can approximate y with as much precision as we want. But if there is noise, the regression tree will over fit the noisy values, creating some artificial spikes.

If you want to avoid this over fitting, you sample a lot of times the values of X, and for each sample you build a regression tree, and then average them. When you average them, every tree will contain its own particular artificial spikes, and if they are noise, they won't appear in the majority of the other trees. So when you average them, the spikes will attenuate, creating the smoother behaviour that the article talks about.

I hope it helps!

replies(5): >>41890442 #>>41892542 #>>41892756 #>>41893558 #>>41893790 #
1. hammock ◴[] No.41890442[source]
Thanks that helps. The way I think about your example is it’s like (not the same obv) taking a bunch of moving averages of different durations at different starting points, and throwing those into your regression model along with the actual data