Statistics for Machine Learning
上QQ阅读APP看书,第一时间看更新

Example of lasso regression machine learning model

Lasso regression is a close cousin of ridge regression, in which absolute values of coefficients are minimized rather than the square of values. By doing so, we eliminate some insignificant variables, which are a very much compacted representation similar to OLS methods.

The following implementation is similar to ridge regression apart from penalty application on mod/absolute value of coefficients:

>>> from sklearn.linear_model import Lasso 
 
>>> alphas = [1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0] 
>>> initrsq = 0 
>>> print ("\nLasso Regression: Best Parameters\n") 
 
>>> for alph in alphas: 
...      lasso_reg = Lasso(alpha=alph)  
...      lasso_reg.fit(x_train,y_train)     
...      tr_rsqrd = lasso_reg.score(x_train,y_train) 
...      ts_rsqrd = lasso_reg.score(x_test,y_test) 
 
...      if ts_rsqrd > initrsq: 
...          print ("Lambda: ",alph,"Train R-Squared value:",round(tr_rsqrd,5),"Test R-squared value:",round(ts_rsqrd,5)) 
...          initrsq = ts_rsqrd

This is shown in the following screenshot:

>>> ridge_reg = Ridge(alpha=0.001)  
>>> ridge_reg.fit(x_train,y_train)   
>>> print ("\nRidge Regression coefficient values of Alpha = 0.001\n") 
>>> for i in range(11):  
...     print (all_colnms[i],": ",ridge_reg.coef_[i]) 
 
>>> lasso_reg = Lasso(alpha=0.001)  
>>> lasso_reg.fit(x_train,y_train) 
>>> print ("\nLasso Regression coefficient values of Alpha = 0.001\n") 
>>> for i in range(11): 
...      print (all_colnms[i],": ",lasso_reg.coef_[i])

The following results show the coefficient values of both methods; the coefficient of density has been set to 0 in lasso regression, whereas the density value is -5.5672 in ridge regression; also, none of the coefficients in ridge regression are zero values:

The R code for lasso regression on the wine quality data is as follows:

# Above Data processing steps are same as Ridge Regression, only below section of the code do change 
 
# Lasso Regression 
print(paste("Lasso Regression")) 
lambdas = c(1e-4,1e-3,1e-2,0.1,0.5,1.0,5.0,10.0) 
initrsq = 0 
for (lmbd in lambdas){ 
  lasso_fit = glmnet(x_train,y_train,alpha = 1,lambda = lmbd) 
  pred_y = predict(lasso_fit,x_test) 
  R2 <- 1 - (sum((test_data[,yvar]-pred_y )^2)/sum((test_data[,yvar]-mean(test_data[,yvar]))^2)) 
   
  if (R2 > initrsq){ 
    print(paste("Lambda:",lmbd,"Test Adjusted R-squared :",round(R2,4))) 
    initrsq = R2 
  } 
}