Homework 6 in DSCI445: Statistical Machine Learning @ CSU
Be sure to set.seed(445)
.
We will explore the maximal margin classifier on a toy data set.
Obs | X_1 | X_2 | Y |
---|---|---|---|
1 | 3 | 4 | Red |
2 | 2 | 2 | Red |
3 | 4 | 4 | Red |
4 | 1 | 4 | Red |
5 | 2 | 1 | Blue |
6 | 4 | 3 | Blue |
7 | 4 | 1 | Blue |
Sketch the observations.
Sketch the optimal separating hyperplane and provide the equation for this hyperplane.
Describe the classification rule for the maximal marginal classifier. It should be along the lines of “Classify to Red if \(\beta_0 + \beta_1 X_1 + \beta_2 X_2 > 0\), and classify as Blue otherwise. Provide the values of \(\beta_0, \beta_1, \beta_2\).
On your sketch, indicate the margin for the maximal margin classifier.
Indicate the support vectors for the maximal margin classifier.
Argue that a slight movement of the seventh observation would not affect the maximal margin hyperplane.
Draw an additional observation on the plot so that the two classes are no longer separable by a hyperplane.
We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
n <- 500
x1 <- runif(n) - 0.5
x2 <- runif(n) - 0.5
y <- as.numeric(x1^2 - x2^2 > 0)
Plot the observations, colored according to their class labels.
Fit a logistic regression model to the data using \(X_1\) and \(X_2\) as predictors.
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. What shape is the decision boundary?
Now fit a logistic regression model to the data using non-linear functions of \(X_1\) and \(X_2\) as predictors (e.g., \(X_1^2, X_1 \times X_2, \log(X_2)\), etc.)
Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. What shape is the decision boundary? Repear a)- e) until you come up with an example in which the predicted class labels are obviously non-linear.
Fit a support vector classifier with \(X_1\) and \(X_2\) as predictors. Obtain a class predictor for each training observation. Plot the observations, colored according to the predicted class labels.
Fit an SVM using a non-linear kernel to the data with \(X_1\) and \(X_2\) as predictors. Obtain a class predictor for each training observation. Plot the observations, colored according to the predicted class labels.
Comment on your results.
In this problem, you will use support vector approaches to
predict whether a given car gets high or low gas mileage based on the
Auto
data set in the ISLR
package.
Create a binary variable that takes value 1 for gas mileage above the median and 0 for cars below the median.
Fit a support vector classifier to the data with various values
of cost
, in order to predict whether a car gets high or low
gas mileage (be sure not to include the original gas mileage variable –
no cheating!). Report the cross-validation errors associated with
different values of this parameter, comment on your results.
Now repeat (b) using SVMs with radial and polynomial basis
kernels, with different values of gamma
,
degree
, and cost
. Report on your
results.
Make some plots to back up your assertions in b) and c).
This problem involves the OJ
data set in the
ISLR
package.
Create a training set containing a random sample of 900 observations and a test set containg the remaining observations.
Fit a support vector classifier to the training set using
cost = 0.01
with Purchase
as the response and
the other variables as predictors. Use the summary()
function to produce summary statistics and describe the results
obtained.
What are the training and test error rates?
Use CV to select an optimal cost
. Consider values
between \(0.01\) and \(10\).
Compute the training and test error rates using this new value
for cost
.
Repeat b) through e) using a support vector machine with a radial
kernal and default value for gamma
.
Repeat b) through e) using a support vector machine with a
polynomial kernal and degree = 2
.
Which approach gives the best results on this data?
Be sure to share your server project with the instructor and grader:
Open your hw-6 project on liberator.stat.colostate.edu
Click the drop down on the project (top right side) > Share Project…
Click the drop down and add “dsci445instructors” to your project.
This is how you receive points for reproducibility on your homework!