Random Subspace Projection for Predicting Biogeographical Ancestry
Human biogeographical ancestry estimation using genomic information is an important problem with applications in population stratification, admixture mapping, forensic ancestry inference, and in healthcare. Various studies have proposed panels of ancestry informative single nucleotide polymorphisms (SNPs) for distinguishing between widely separated continental populations. There has been limited investigation on identifying SNP panels for sub-continental ancestry prediction, especially given the difficult challenge of identifying SNP markers to distinguish closely associated sub-populations, for instance, within a continent. In this study, we propose an ancestry informative SNP selection algorithm exploiting the concept of random subspace projection using supervised learning. The proposed approach identifies small panels of useful SNPs for subcontinental level ancestry classification. We show results for sub-continental level classification for all five continents in our dataset.
Ancestry Classification, DNA, Random Subspace Projection, Single Chromosome, SNP, SNP Selection
Toma, Tanjin, Tayo Olufemi-Ajayi, Jeremy Dawson, and Donald Adjeroh. "Random subspace projection for predicting biogeographical ancestry." In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1719-1725. IEEE, 2018.