Structured Iterative Hard Thresholding for Categorical and Mixed Data Types
In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numericalal features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.
categorical data types, sparse linear model, thresholding; feature selection.
Nguyen, Thy, and Tayo Obafemi-Ajayi. "Structured Iterative Hard Thresholding for Categorical and Mixed Data Types." In 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 2541-2547. IEEE, 2019.