Look at GA^2M (Lou, Caruana, Gehrke). It is an extension to GAM that allows you to create huge complex ensembles, without sacrificing GAM's intelligibility.
My first thoughts as well! I'm going to be trying it out in R tonight, but it looks like there are some people working on an implementation in statsmodels and scikit-learn.
In a GAM, you are estimating the non-linear functions for all variables in the model simultaneously. Moreover, GAMs allows for smoothing techniques such as regression splines, which allows you to cast GAMs as a large penalized GLM. This has ties to Bayesian regression and mixed effects models.
In a GAM, you are not estimating a bunch of individual smoothers in isolation and then throwing them in a model.
I also tried the default setting of gamma=1/(data dimension), as well as many values in between. I also played with the tuning function, but ran out of patience.
> From an accuracy standpoint, GAMs are competitive with popular learning techniques such as Random Forest or SVM.
Would be great to get a reference on this. In [1] the authors compared MARS to RF and SVM on several datasets and it didn't look so good.
May be they just got good performance on the one dataset mentioned at the end or did not optimize the parameters of the competing classifiers. I think it is telling that SVM performed worse than a linear classifier.
One should not reports results on a method they do not understand how to use. the SVM parameters are non-sensical. This would not pass basic peer review
I also tried polynomial kernels of order=3 with costs around 0.1, as well as many different gammas for the radial kernel. No luck. As I said, the conversion to probabilities could be the culprit.
However, the predictive performance of SVM is irrelevant to the main points I am trying to make. In other words, even if SVM beat GAM in this single test, it does not invalidate the highlighted benefits of GAM. I would argue that GAM poccesses qualities that SVMs do not, and vice versa.
Feel free to suggest different SVM settings, or a better way to convert classifications into a continuous measure, and I will change the content in the comparison table. The data and code can be downloaded here: https://github.com/klarsen1/gampost.
Two things. A GAM is a non-linear model and is quite flexible. The degree of non-linearity is achieved by tuning the spline parameter degrees of freedom as well as introducing tensor splines to get non-linear interactions.
Second, the no free lunch theorms really make papers like the above a lot less telling than you might think. All I really get from them is that RF is a good modeling framework to try but for individual problems maybe try boosting on an SVM or a NN model.