Date of Graduation

Fall 2024

Degree

Master of Science in Computer Science

Department

Computer Science

Committee Chair

Belkhouche Mohammed

Abstract

The challenge of predicting the outcome of a team game lies in the high complexity and dynamics of the sports data. This thesis focuses on the aspect of using feature engineering and the genetic algorithm to predict the winner and the score of various sports events. Generally, it deals with how machine learning algorithms are combined with state-of-the-art feature engineering techniques in sports datasets derived from various sports disciplines. In this thesis, five different machine learning models have been applied, classification and regression trees (CART), random forest (RF), stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and extreme learning machine (ELM), with feature selection driven by Genetic Algorithms and autoregressive weights. Moreover, Inverse Distance Weighting (IDW) is used to emphasize the impact of game performances in recent times. This thesis shows that these techniques can achieve a high prediction accuracy by integrating machine learning with carefully engineered features. The presented framework is aimed at robustness across many sports datasets; as such, it becomes a versatile tool for predicting results in different team sports. Three experiments were conducted using basketball, soccer and American football datasets. The results demonstrate that the proposed approach significantly improves prediction accuracy across a variety of sports, as the conducted experiments showed improvements in both predicting the winner of a match and estimating the final score of such a match, when compared with other studies that utilized the same set of data.

Keywords

feature engineering, genetic algorithm, machine learning, inverse distance weighting, team sports

Subject Categories

Applied Statistics | Categorical Data Analysis | Databases and Information Systems | Data Science | Numerical Analysis and Computation | Probability | Sports Management | Statistical Models

Copyright

© Vitor S. Freitas

Open Access

Share

COinS