I wrote a script that animates decision tree regression with different depths.
One of the best ways to learn programming is to work on image manipulation projects. Pixel values are naturally represented as 2d arrays, many image processing techniques involve matrix operations, and, of course, you get to see the results.
In the animation above, pixel values are separately predicted on each of the red, green, and blue channels from the x and y coordinates using a sequence of decision tree regressors with increasing depth. Decision tree regression is really a method for function approximation that produces a locally constant estimation (note the single-color rectangles in the frames of the gif).
In some ways a picture is the perfect sort of ``function'' to illustrate the strengths of a technique like decision trees. Pixel values are certainly non-linear in the coordinates. It is hard to think of a natural basis function for a generic image (though these may exist, as we will discuss in another essay on image compression) but a locally constant model seems to effectively capture large areas of mostly one color. Decision trees estimate using ordinal comparisons. This means that features do not need to be normalized. This is nice. Normalizing a full coordinate lattice is somehow distasteful.
Of course, single decision tree estimators are prone to overfitting, and in this task we have not made any predcitions of data outside the training set. This tool is just to help develop intuition for decision tree regression and to facilitate understanding other related techniques for later posts.