Statistical Comparative Analysis and Evaluation of Validation Indices for Clustering Optimization
Abstract
Clustering is a relevant exploratory tool for a broad range of machine learning applications as it aids identification of meaningful subgroups. For a given clustering algorithm, multiple partitions can be obtained on the same data set by varying algorithmic parameters. Internal validation indices provide a means to objectively evaluate how well groupings obtained from a clustering configuration partitions the data, since there is no prior labeled data. This work presents a rigorous statistical evaluation framework that analyzes performance of internal validation indices based on correlation with external indices. A synthetic data generator that captures a wide range of complexity is proposed. Evaluation is conducted on a varied set of synthetic data types and real data sets to investigate performance of the indices.
Department(s)
Engineering Program
Document Type
Conference Proceeding
DOI
https://doi.org/10.1109/SSCI47803.2020.9308412
Keywords
clustering, statistical analysis, validation indices
Publication Date
12-1-2020
Recommended Citation
Nguyen, Thy, Jason Viehman, Dacosta Yeboah, Gayla R. Olbricht, and Tayo Obafemi-Ajayi. "Statistical Comparative Analysis and Evaluation of Validation Indices for Clustering Optimization." In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 3081-3090. IEEE, 2020.
Journal Title
2020 IEEE Symposium Series on Computational Intelligence, SSCI 2020