Homework 6 in DSCI445: Statistical Machine Learning @ CSU
Be sure to set.seed(445)
.
We will explore the maximal margin classifier on a toy data set.
Obs | X_1 | X_2 | Y |
---|---|---|---|
1 | 3 | 4 | Red |
2 | 2 | 2 | Red |
3 | 4 | 4 | Red |
4 | 1 | 4 | Red |
5 | 2 | 1 | Blue |
6 | 4 | 3 | Blue |
7 | 4 | 1 | Blue |
Sketch the observations.
Sketch the optimal separating hyperplane and provide the equation for this hyperplane.
Describe the classification rule for the maximal marginal classifier. It should be along the lines of "Classify to Red if \(\beta_0 + \beta_1 X_1 + \beta_2 X_2 > 0\), and classify as Blue otherwise. Provide the values of \(\beta_0, \beta_1, \beta_2\).
On your sketch, indicate the margin for the maximal margin classifier.
Indicate the support vectors for the maximal margin classifier.
Argue that a slight movement of the seventh observation would not affect the maximal margin hyperplane.
Draw an additional observation on the plot so that the two classes are no longer separable by a hyperplane.
We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
n <- 500
x1 <- runif(n) - 0.5
x2 <- runif(n) - 0.5
y <- as.numeric(x1^2 - x2^2 > 0)
Plot the observations, colored according to their class labels.
Fit a logistic regression model to the data using \(X_1\) and \(X_2\) as predictors.
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. What shape is the decision boundary?
Now fit a logistic regression model to the data using non-linear functions of \(X_1\) and \(X_2\) as predictors (e.g., \(X_1^2, X_1 \times X_2, \log(X_2)\), etc.)
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. What shape is the decision boundary? Repear a)- e) until you come up with an example in which the predicted class labels are obviously non-linear.
Fit a support vector classifier with \(X_1\) and \(X_2\) as predictors. Obtain a class predictor for each training observation. Plot the observations, colored according to the predicted class labels.
Fit an SVM using a non-linear kernel to the data with \(X_1\) and \(X_2\) as predictors. Obtain a class predictor for each training observation. Plot the observations, colored according to the predicted class labels.
Comment on your results.
In this problem, you will use support vector approaches to predict whether a given car gets high or low gas mileage based on the Auto
data set in the ISLR
package.
Create a binary variable that takes value 1 for gas mileage above the median and 0 for cars below the median.
Fit a support vector classifier to the data with various values of cost
, in order to predict whether a car gets high or low gas mileage (be sure not to include the original gass mileage variable – no cheating!). Report the cross-validation errors associated with different values of this parameter, comment on your results.
Now repeat (b) using SVMs with radial and polynomial basis kernels, with different values of gamma
, degree
, and cost
. Report on your results.
Make some plots to back up your assertions in b) and c).
This problem involves the OJ
data set in the ISLR
package.
Create a training set containing a random sample of 900 observations and a test set containg the remaining observations.
Fit a support vecotr classifier to the training set using cost = 0.01
with Purchase
as the response and the other variables as predictors. Use the summary()
function to produce summary statistics and describe the results obtained.
What are the training and test error rates?
Use the tune()
function to select an optimal cost
. Consider values between \(0.01\) and \(10\).
Compute the training and test error rates using this new value for cost
.
Repeat b) through e) using a support vector machine with a radial kernal and default value for gamma
.
Repeat b) through e) using a support vector machine with a polynomial kernal and degree = 2
.
Which approach gives the best results on this data?