We will use the Auto data set in the ISLR package.
library(ISLR)
library(tidyverse)
library(knitr)
library(tidymodels)
head(Auto) %>%
kable()| mpg | cylinders | displacement | horsepower | weight | acceleration | year | origin | name |
|---|---|---|---|---|---|---|---|---|
| 18 | 8 | 307 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
| 15 | 8 | 350 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
| 18 | 8 | 318 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
| 16 | 8 | 304 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
| 17 | 8 | 302 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
| 15 | 8 | 429 | 198 | 4341 | 10.0 | 70 | 1 | ford galaxie 500 |
Before we begin, be sure to set the seed for reproducibility.
set.seed(445)0.1 Validation Set Approach
Split the data into 50% training and 50% test data.
Fit a linear model of
mpgonhorsepowerusing your training data.Estimate the test error by using test MSE.
Repeat steps 2-3 for a cubic and quadratic model. Which model would you pick?
Repeat steps 1-4 after reseting the seed
set.seed(42)Did you get the same results? Is this what you expected to happen?
0.2 LOOCV
- Get the estimate of test MSE for the linear model using LOOCV.
- Repeat steps 2-3 for a cubic and quadratic model. Which model would you pick?
0.3 k-Fold CV
- Using \(k = 10\)-fold CV, compute the \(k\)-fold CV estimate of the test MSE for polynomial models of order \(i = 1, \dots, 10\). (Hint: you can use the
polyfunction in your formula to specify a polynomial model.) - Plot the estimated test MSE vs. the polynomial order.
- Which of these models would you choose?
0.4 Bonus
- Write your own \(k\)-fold CV function that will calculate CV for the \(KNN\) Regression model. You function should take as parameters
- CV \(k\) value
- KNN \(K\) value
- Data
- A vector of names (character) of predictor columns
- A character string of the response column
- Use your function to estimate the test MSE using 10-fold CV for KNN models with \(K = 1, 5, 10, 20, 100\) of a model predicting
mpgusing thehorsepowerpredictor variable in theAutodata set. - Compare your results to the previous \(k\)-Fold CV method.