Comparative Analysis of Feature Selection Methods to Identify Biomarkers in a Stroke-Related Dataset

Abstract

This paper applies machine learning feature selection techniques to the REGARDS stroke-related dataset to identify health-related biomarkers. A data-driven methodological framework is presented to evaluate multiple feature selection methods. In applying the framework, three classifiers are chosen in conjunction with two wrappers, and their performance with diverse classification targets such as Current Smoker, Current Alcohol Use, and Deceased is evaluated. The performance across logistic regression, random forest and naïve Bayes classifier methods, as quantified by the ROC Area Under Curve metric and selected features, was similar. However, significant differences were observed in running time. Performance of the selected features was also evaluated based on the accuracy of a prediction model generated using a multi-layer perceptron (MLP) classifier.

Department(s)

Engineering Program

Document Type

Conference Proceeding

DOI

https://doi.org/10.1109/CIBCB.2019.8791457

Keywords

classification, feature selection, machine learning

Publication Date

7-1-2019

Share

COinS