Disulfide connectivity prediction by expanding template proteins and encoding global protein secondary structure
Disulfide connectivity prediction from protein sequence helps determine protein three dimensional structure. The methods for predicting disulfide connectivity generally fall into two categories: pair-wise and pattern-wise methods. Previously, the accuracy rates of the predictions were up to 52%, because (1) pair-wise methods feature mainly on each individual bond but neglect the global influence among disulfide bonds in each protein; (2) pattern-wise methods, only comparing proteins with the same number of bonds, aggravate the insufficiency of template proteins. Recently, the accuracy rate has been improved to 70% using the state-art technique of SVM in a pair-wise method. We generalize pattern-wise methods by developing a method which allows to compare test proteins with template proteins having different numbers of disulfide bonds under certain conditions. In addition, we propose a global descriptor to encode secondary structure. Embedding this descriptor into our method in 4-fold of SP39, we obtain a prediction accuracy of 70%, which ties the best result among all methods and, in return, indicates that a disulfide bond pattern is highly related to protein secondary structure.