Structured Iterative Hard Thresholding for Categorical and Mixed Data Types

Abstract

In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numericalal features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.

Department(s)

Engineering Program

Document Type

Conference Proceeding

DOI

https://doi.org/10.1109/SSCI44817.2019.9002948

Keywords

categorical data types, sparse linear model, thresholding; feature selection.

Publication Date

12-1-2019

Share

COinS