Date of Graduation

Spring 2019

Degree

Master of Science in Mathematics

Department

Mathematics

Committee Chair

George Mathew

Abstract

An important problem in data science and statistical learning is to predict an outcome based on data collected on several predictor variables. This is generally known as a regression problem. In the field of big data studies, the regression model often depends on a large number of predictor variables. The data scientist is often dealing with the difficult task of determining the most appropriate set of predictor variables to be employed in the regression model. In this thesis we adopt a technique that constraints the coefficient estimates which in effect shrinks the coefficient estimates towards zero. Ridge regression and lasso are two well-known methods for shrinking the coefficients towards zero. These two methods are investigated in this thesis. Ridge regression and lasso techniques are compared by analyzing a real data set for a regression model with a large collection of predictor variables.

Keywords

ridge regression, lasso, cross validation, mean square error, Akaike information criterion, Bayesian information criterion

Subject Categories

Mathematics

Copyright

© Dalip Kumar

Open Access

Included in

Mathematics Commons

Share

COinS