```{r setup, message = FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(broom) # install.packages("leaps") # if necessary library(leaps) library(ISLR) ``` ## ISLR Lab 6.5.1 **Instructions**: First complete Section 6.5.1 from ISLR in the space saved for it below. This should be relatively quick since you can just copy/paste. #### Save original outcome ```{r} # Don't change this Hitters <- na.omit(Hitters) original_Salary <- Hitters$Salary head(Hitters) ``` #### Simulated outcome ```{r} X <- Hitters |> select(-Salary) |> select(where(is.numeric)) |> as.matrix() true_beta <- rep(0, ncol(X)) # Set some nonzero coefficients true_beta[c(2:5, 10)] <- c(2, 1, -1, 4, 1) true_beta ``` ```{r} set.seed(1) # change this to any other number sigma <- 3 # change noise level Hitters$Salary <- 100 + # intercept term X %*% true_beta + sigma * rnorm(nrow(X)) ``` #### Step 1: Analyze the original `Hitters` data ```{r temporary} # remove this after finishing 6.5.1 Hitters$Salary <- original_Salary ``` Insert code chunks and complete the Lab 6.5.1 here ```{r} # Begin 6.5.1 # regfit.full <- regsubsets(Salary ∼ ., Hitters) # etc ``` #### Step 2: Interpret the results **Question**: Should we expect to get the same models selected by best subsets, forward selection, and backward selection? Why or why not? **Question**: Change the `set.seed(1)` in the part about using a validation set approach to use a different number. Does anyone get different "best" models than the ones in the book? #### Step 3: Repeat with simulated `Salary` variable 1. Now go back and remove the code chunk called `temporary`. 2. Change the data generating process by choosing which predictor variables have nonzero coefficients and changing the values of the coefficients. 3. Re-run the variable selection code and see if any of the methods choose the right variables. 4. Repeat steps 2 and 3, trying several different values for `set.seed()`, `true_beta`, and `sigma`. Try to guess how the results will change and then check your guesses against the actual output. Take note of anything that seems sufficiently interesting.