R: Generalized Additive Model Selection

gam.selection {mgcv}

R Documentation

Generalized Additive Model Selection

Description

This page is intended to provide some more information on how to select GAMs. Given a model structure specified by a gam model formula, gam() attempts to find the appropriate smoothness for each applicable model term using Generalized Cross Validation (GCV) or an Un-Biased Risk Estimator (UBRE), the latter being used in cases in which the scale parameter is assumed known. GCV and UBRE are covered in Craven and Wahba (1979) and Wahba (1990). Fit method "magic" uses Newton or failing that steepest descent updates of the smoothing parameters and is particularly numerically robust. Fit method "mgcv" alternates grid searches for the correct overall level of smoothness for the whole model, given the relative smoothness of terms, with Newton/Steepest descent updates of the relative smoothness of terms, given the overall amount of smoothness.

Automatic smoothness selection is unlikely to be successful with few data, particularly with multiple terms to be selected. The mgcv method can also fail to find the real minimum of the GCV/UBRE score if the model contains many smooth terms that should really be completely smooth, or close to it (e.g. a straight line for a default 1-d smooth). The problem is that in this circumstance the optimal overall smoothness given the relative smoothness of terms may make all terms completely smooth - but this will tend to move the smoothing parameters to a location where the GCV/UBRE score is nearly completely flat with respect to the smoothing parameters so that Newton and steepest descent are both ineffective. These problems can usually be overcome by replacing some completely smooth terms with purely parametric model terms.

A good example of where smoothing parameter selection can ``fail'', but in an unimportant manner is provided by the rock.gam example in Venables and Ripley. In this case 3 smoothing parameters are to estimated from 48 data, which is probably over-ambitious. gam will estimate either 1.4 or 1 degrees of freedom for the smooth of shape, depending on the exact details of model specification (e.g. k value for each s() term). The lower GCV score is really at 1.4 (and if the other 2 terms are replaced by straight lines this estimate is always returned), but the shape term is in no way significant and the lowest GCV score is obtained by removing it altogether. The problem here is that the GCV score contains very little information on the optimal degrees of freedom to associate with a term that GCV would suggest should really be dropped.

In general the most logically consistent method to use for deciding which terms to include in the model is to compare GCV/UBRE scores for models with and without the term. More generally the score for the model with a smooth term can be compared to the score for the model with the smooth term replaced by appropriate parametric terms. Candidates for removal can be identified by reference to the approximate p-values provided by summary.gam. Candidates for replacement by parametric terms are smooth terms with estimated degrees of freedom close to their minimum possible.

One appealing approach to model selection is via shrinkage. Smooth classes cs.smooth and tprs.smooth (specified by "cs" and "ts" respectively) have smoothness penalties which include a small shrinkage component, so that for large enough smoothing parameters the smooth becomes identically zero. This allows automatic smoothing parameter selection methods to effectively remove the term from the model altogether. The shrinkage component of the penalty is set at a level that usually makes negligable contribution to the penalization of the model, only becoming effective when the term is effectively `completely smooth' according to the conventional penalty.

Author(s)

Simon N. Wood simon@stats.gla.ac.uk

References

Craven and Wahba (1979) Smoothing Noisy Data with Spline Functions. Numer. Math. 31:377-403

Venables and Ripley (1999) Modern Applied Statistics with S-PLUS

Wahba (1990) Spline Models of Observational Data. SIAM.

Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428

Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114

http://www.stats.gla.ac.uk/~simon/

Examples

## an example of GCV based model selection
library(mgcv)
set.seed(0) 
n<-400;sig<-2
x0 <- runif(n, 0, 1);x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1);x3 <- runif(n, 0, 1)
x4 <- runif(n, 0, 1);x5 <- runif(n, 0, 1)
f <- 2 * sin(pi * x0)
f <- f + exp(2 * x1) - 3.75887
f <- f+0.2*x2^11*(10*(1-x2))^6+10*(10*x2)^3*(1-x2)^10-1.396
e <- rnorm(n, 0, sig)
y <- f + e
## Note the increased gamma parameter below to favour
## slightly smoother models...
b<-gam(y~s(x0,bs="ts")+s(x1,bs="ts")+s(x2,bs="ts")+
   s(x3,bs="ts")+s(x4,bs="ts")+s(x5,bs="ts"),gamma=1.4)
summary(b)
plot(b,pages=1)

[Package mgcv version 1.1-5 Index]