Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition
Successful protein structure identification enables researchers to estimate the biological functions of proteins, yet it remains a challenging problem. The most common method for determining an unknown protein's structural class is to perform expensive and time-consuming manual experiments. Because of the availability of amino acid sequences generated in the post-genomic age, it is possible to predict an unknown protein's structural class using machine learning methods given a protein's amino-acid sequence and/or its secondary structural elements. Following recent research in this area, we propose a new machine learning system that is based on combining several protein descriptors extracted from different protein representations, such as position specific scoring matrix (PSSM), the amino-acid sequence, and secondary structural sequences. The prediction engine of our system is operated by an ensemble of support vector machines (SVMs), where each SVM is trained on a different descriptor. The results of each SVM are combined by sum rule. Our final ensemble produces a success rate that is substantially better than previously reported results on three well-established datasets.
Information Technology and Cybersecurity
Protein structure class, Protein descriptors, Machine learning, Ensemble of classifiers, Support vector machines
Nanni, Loris, Sheryl Brahnam, and Alessandra Lumini. "Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition." Journal of theoretical biology 360 (2014): 109-116.
Journal of theoretical biology