```{r setup, message = FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(broom)
# install.packages("leaps") # if necessary
library(leaps)
library(ISLR)
```
## ISLR Lab 6.5.1
**Instructions**: First complete Section 6.5.1 from ISLR in the space saved for it below. This should be relatively quick since you can just copy/paste.
#### Save original outcome
```{r}
# Don't change this
Hitters <- na.omit(Hitters)
original_Salary <- Hitters$Salary
head(Hitters)
```
#### Simulated outcome
```{r}
X <- Hitters |>
select(-Salary) |>
select(where(is.numeric)) |>
as.matrix()
true_beta <- rep(0, ncol(X))
# Set some nonzero coefficients
true_beta[c(2:5, 10)] <- c(2, 1, -1, 4, 1)
true_beta
```
```{r}
set.seed(1) # change this to any other number
sigma <- 3 # change noise level
Hitters$Salary <- 100 + # intercept term
X %*% true_beta + sigma * rnorm(nrow(X))
```
#### Step 1: Analyze the original `Hitters` data
```{r temporary}
# remove this after finishing 6.5.1
Hitters$Salary <- original_Salary
```
Insert code chunks and complete the Lab 6.5.1 here
```{r}
# Begin 6.5.1
# regfit.full <- regsubsets(Salary ∼ ., Hitters) # etc
```
#### Step 2: Interpret the results
**Question**: Should we expect to get the same models selected by best subsets, forward selection, and backward selection? Why or why not?
**Question**: Change the `set.seed(1)` in the part about using a validation set approach to use a different number. Does anyone get different "best" models than the ones in the book?
#### Step 3: Repeat with simulated `Salary` variable
1. Now go back and remove the code chunk called `temporary`.
2. Change the data generating process by choosing which predictor variables have nonzero coefficients and changing the values of the coefficients.
3. Re-run the variable selection code and see if any of the methods choose the right variables.
4. Repeat steps 2 and 3, trying several different values for `set.seed()`, `true_beta`, and `sigma`. Try to guess how the results will change and then check your guesses against the actual output. Take note of anything that seems sufficiently interesting.