Mining Calcium-binding Sites From Protein Structure Graphs
Identifying protein Calcium-binding sites is an important problem in proteomics. To this end, we construct a graph containing only oxygen information to represent protein partial structures. In this graph, each vertex represents an oxygen atom. Edges are given to any two vertex-atoms based on a simple distance threshold between contact atoms. Applying a clique-finding algorithm to a set of graphs representing a group of calcium-binding proteins, we obtain several hundred oxygen clique-clusters with size four possibly around calcium-binding sites. We then use geometric and chemic properties of four cospherical vertices to exclude some clique-clusters. We finally use Support Vector Machines (SVM) to do binary classification with vertex-atom coordinates as the input variables for distinguishing calcium-binding clique-clusters and non calcium-binding clique-clusters. The results show the site selectivity reaches 80% with 91% site sensitivity. This new protein graph mining and geometric classification model can be used for rapid and automated annotation of protein function - calcium binding.