Towards Explainability of Dimension Reduction Plots of Unsupervised Learning Model Outcomes

Abstract

Dimension reduction methods are used to visualize the output of unsupervised learning models when applied to complex data. These techniques improve interpretability by transforming a high-dimension space to a lower-dimension space (usually 2D or 3D). The results are typically viewed as 2D scatter plots, and class centroids may be added to increase interpretability. Although useful, the relationship of these class centroids to the underlying feature space remains opaque. The innovative aspect of this work is to create a strong link between the dimension-reduced space and the underlying high-dimension feature space by adding selected feature centroids to the 2D scatter plots. This approach simultaneously visualizes the centers for the classes and the features on the same 2D scatter plot. Since classes are often imbalanced, we provide a method to balance class sizes. We present an automated framework that performs a grid search to find the optimal dimension reduction parameters, balances the class sizes, uses an ensemble approach to find the most important features, and adds class centroids and selected feature centroids to 2D dimension-reduced plots. This is especially useful when applied to complex, feature-rich biomedical data, as addition of feature centroids to 2D scatter plots serve as landmarks for the previously featureless dimension-reduced space. The utility of this approach is demonstrated by its application to seven classes of neurogenetic diseases with 31 defining phenotypic features.

Department(s)

Cooperative Engineering Program

Document Type

Conference Proceeding

DOI

10.1109/CIBCB58642.2024.10702157

Publication Date

1-1-2024

Journal Title

21st IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology Cibcb 2024

Share

COinS