Ridge regression with glmnet

Use the glmnet package to fit a ridge regression model on the same data as in the previous part. Hint: read about the alpha input to the glmnet function in the documentation.

# install.packages("glmnet") # if necessary
library(glmnet)
set.seed(1)
n <- 100
p <- 1000
X <- matrix(rnorm(n*p), nrow = n)
beta <- rpois(p, lambda = 1)
y <- X %*% beta + rnorm(n)

model_ridge <- glmnet(X, y, intercept = FALSE, alpha = 0)

What does plotting the model object show?

plot(model_ridge, xvar = "lambda")

Try the functions in the broom package on the model object

#beta
beta_ridge <- coef(model_ridge, s = 0.000001)[-1,]

Early stopping

Copy/paste your gradient descent (or stochastic version) code for ordinary linear regression here and use it to try regularization via early stopping

Consider generating data with p larger than n, so the least squares estimator is undefined

# Gradient of the ridge loss
least_squares_gradient <- function(x, y, beta) {
  -2 * t(x) %*% (y - x %*% beta)
}

# Loss function
least_squares_loss <- function(x, y, beta) {
  sum((y - x %*% beta)^2)
}

beta0 <- rep(0, p)
previous_loss <- least_squares_loss(X, y, beta0)
grad0 <- least_squares_gradient(X, y, beta0)
beta1 <- beta0 - 0.0001 * grad0
next_loss <- least_squares_loss(X, y, beta1)
previous_beta <- beta1
steps <- 1

while (abs(previous_loss - next_loss) > 0.0001) {
  gradn <- least_squares_gradient(X, y, previous_beta)
  next_beta <- previous_beta - (0.99)^steps * gradn / sqrt(sum(gradn^2))
  if (round(steps/10) == steps/10) print(previous_loss)
  steps <- steps + 1
  previous_beta <- next_beta
  previous_loss <- next_loss
  next_loss <- least_squares_loss(X, y, next_beta)
}
## [1] 11407.88
## [1] 266.3689
## [1] 251.4202
## [1] 211.0913
## [1] 174.0072
## [1] 143.4069
## [1] 118.2636
## [1] 97.60763
## [1] 80.63234
## [1] 66.67552
## [1] 55.19451
## [1] 45.74477
## [1] 37.96209
## [1] 31.54806
## [1] 26.25808
## [1] 21.89167
## [1] 18.28446
## [1] 15.30162
## [1] 12.83258
## [1] 10.78657
## [1] 9.089118
## [1] 7.679036
## [1] 6.506077
## [1] 5.528941
## [1] 4.713668
## [1] 4.03232
## [1] 3.461901
## [1] 2.983466
## [1] 2.581404
## [1] 2.242834
## [1] 1.957124
## [1] 1.71549
## [1] 1.510665
## [1] 1.336636
## [1] 1.188415
## [1] 1.061865
## [1] 0.9535496
## [1] 0.8598109
## [1] 0.3225783
## [1] 0.2788244
## [1] 0.2419991
## [1] 0.2109399
## [1] 0.1846868
## [1] 0.1624458
## [1] 0.1435598
## [1] 0.1274844
## [1] 0.1137679
## [1] 0.102035
## [1] 0.09197368
## [1] 0.03787949
## [1] 0.03290828
## [1] 0.02871489
## [1] 0.02516848
## [1] 0.02216239
## [1] 0.01960832
## [1] 0.01743307
## [1] 0.01557591
## [1] 0.01398638
## [1] 0.01248505
## [1] 0.005769321
## [1] 0.005057686
## [1] 0.004454588
## [1] 0.003942118
## [1] 0.003505609
## [1] 0.003132888
## [1] 0.002813835
## [1] 0.001188293
## [1] 0.001027777
## [1] 0.0008936624
## [1] 0.0007804425
## [1] 0.0006846499
## [1] 0.0006034161
## [1] 0.0005343659
## [1] 0.0004755301
## [1] 0.0004252743
## [1] 0.0003822402
## [1] 0.0001480029
mean((previous_beta - beta)^2)
## [1] 1.694714
mean((previous_beta - beta_ridge)^2)
## [1] 0.01523264

Cross-validation

Write a function to randomly split the data into subsets and implement k-fold cross-validation. You can use the glmnet function, broom functions, but do not use cv.glmnet or any other packages that implement cross-validation. The point is to manually implement iterating over the different subsets, computing RSS, and averaging the results