Comparative Analysis of Feature Selection Methods to Identify Biomarkers in a Stroke-Related Dataset
This paper applies machine learning feature selection techniques to the REGARDS stroke-related dataset to identify health-related biomarkers. A data-driven methodological framework is presented to evaluate multiple feature selection methods. In applying the framework, three classifiers are chosen in conjunction with two wrappers, and their performance with diverse classification targets such as Current Smoker, Current Alcohol Use, and Deceased is evaluated. The performance across logistic regression, random forest and naïve Bayes classifier methods, as quantified by the ROC Area Under Curve metric and selected features, was similar. However, significant differences were observed in running time. Performance of the selected features was also evaluated based on the accuracy of a prediction model generated using a multi-layer perceptron (MLP) classifier.
classification, feature selection, machine learning
Clifford, Thomas, Justin Bruce, Tayo Obafemi-Ajayi, and John Matta. "Comparative analysis of feature selection methods to identify biomarkers in a stroke-related dataset." In 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1-8. IEEE, 2019.