So far we have mainly focused on linear models.
Previously, we have seen we can improve upon least squares using ridge regression, the lasso, principal components regression, and more.
Through simple and more sophisticated extensions of the linear model, we can relax the linearity assumption while still maintiaining as much interpretability as possible.
1 Step Functions
Using polynomial functions of the features as predictors imposes a global structure on the non-linear function of \(X\).
We can instead use step-functions to avoid imposing a global structure.
For a given value of \(X\), at most one of \(C_1, \dots, C_K\) can be non-zero.
Example: Wage data.
| year | age | maritl | race | education | region | jobclass | health | health_ins | logwage | wage |
|---|---|---|---|---|---|---|---|---|---|---|
| 2006 | 18 | 1. Never Married | 1. White | 1. < HS Grad | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 2. No | 4.318063 | 75.04315 |
| 2004 | 24 | 1. Never Married | 1. White | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 2. No | 4.255273 | 70.47602 |
| 2003 | 45 | 2. Married | 1. White | 3. Some College | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 1. Yes | 4.875061 | 130.98218 |
| 2003 | 43 | 2. Married | 3. Asian | 4. College Grad | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 5.041393 | 154.68529 |
2 Basis Functions
Polynomial and piecewise-constant regression models are in fact special cases of a basis function approach.
Idea:
Instead of fitting the linear model in \(X\), we fit the model
Note that the basis functions are fixed and known.
We can think of this model as a standard linear model with predictors defined by the basis functions and use least squares to estimate the unknown regression coefficients.
3 Regression Splines
Regression splines are a very common choice for basis function because they are quite flexible, but still interpretable. Regression splines extend upon polynomial regression and piecewise constant approaches seen previously.
3.1 Piecewise Polynomials
Instead of fitting a high degree polynomial over the entire range of \(X\), piecewise polynomial regression involves fitting separate low-degree polynomials over different regions of \(X\).
For example, a pieacewise cubic with no knots is just a standard cubic polynomial.
A pieacewise cubic with a single knot at point \(c\) takes the form
Using more knots leads to a more flexible piecewise polynomial.
In general, we place \(L\) knots throughout the range of \(X\) and fit \(L + 1\) polynomial regression models.
3.2 Constraints and Splines
To avoid having too much flexibility, we can constrain the piecewise polynomial so that the fitted curve must be continuous.
To go further, we could add two more constraints
In other words, we are requiring the piecewise polynomials to be smooth.
Each constraint that we impose on the piecewise cubic polynomials effectively frees up one degree of freedom, bu reducing the complexity of the resulting fit.
The fit with continuity and 2 smoothness contraints is called a spline.
A degree-\(d\) spline is
3.3 Spline Basis Representation
Fitting the spline regression model is more complex than the piecewise polynomial regression. We need to fit a degree \(d\) piecewise polynomial and also constrain it and its \(d - 1\) derivatives to be continuous at the knots.
The most direct way to represent a cubic spline is to start with the basis for a cubic polynomial and add one truncated power basis function per knot.
Unfortunately, splines can have high variance at the outer range of the predictors. One solution is to add boundary constraints.
3.4 Choosing the Knots
When we fit a spline, where should we place the knots?
How many knots should we use?
3.5 Comparison to Polynomial Regression
4 Generalized Additive Models
So far we have talked about flexible ways to predict \(Y\) based on a single predictor \(X\).
Generalized Additive Models (GAMs) provide a general framework for extending a standard linear regression model by allowing non-linear functions of each of the variables while maintaining additivity.
4.1 GAMs for Regression
A natural way to extend the multiple linear regression model to allow for non-linear relationships between feature and response:
The beauty of GAMs is that we can use our fitting ideas in this chapter as building blocks for fitting an additive model.
Example: Consider the Wage data.
Pros and Cons of GAMs
4.2 GAMs for Classification
GAMs can also be used in situations where \(Y\) is categorical. Recall the logistic regression model:
A natural way to extend this model is for non-linear relationships to be used.
Example: Consider the Wage data.