Please login first
Protein Linear Indices in Bioinformatics Studies: 1. Prediction of Protein Stability Effects of a Complete Set of Alanine Substitutions in Arc Repressor
* 1, 2 , 2, 3 , 4 , 5 , 3 , 6
1  Applied Chemistry Research Center and Department of Drug Design, Chemical Bioactive Center. Central University of Las Villas, Santa Clara, 54830, Villa Clara, Cuba
2  Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Department of Pharmacy, Faculty of Chemistry-Pharmacy and Department of Drug Design, Chemical Bioactive Center. Central University of Las Villas, Santa Clar
3  Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, P.O. Box 22085, 46071 Valencia, Spain
4  Department of Microbiology, Chemical Bioactive Center. Central University of Las Villas, Santa Clara, 54830, Villa Clara, Cuba
5  Faculty of Informatics. University of Cienfuegos, 55500, Cienfuegos, Cuba
6  INIFTA, División Química Teórica, Suc.4, C.C. 16, La Plata 1900, Buenos Aires, Argentina

Abstract: A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. Protein’s total (whole-protein) and local (one or more amino-acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on ℜn[ƒk (xmi): ℜn→ℜn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph’s α-carbon atom adjacency matrix. Total linear indices are linear functional on ℜn. That is, the kth total linear indices are a linear maps from ℜn to the scalar ℜ[ƒk(xm):ℜn→ℜ]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino-acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine-mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It show a high Matthews´ correlation coefficient (MCC = 0.952) for the training set and a MCC = 0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc = 0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine-mutant. On the other hand, linear piecewise regression model compared favorably with respect to linear regression one on predicting the melting temperature (tm) of the Arc alanine-mutants. The linear model explains almost 81% of the variance of the experimental tm (R = 0.90 and s = 4.29) and the LOO press statistics evidenced its predictive ability (q2 = 0.72 and scv = 4.79). Moreover, TOMOCOMD-CAMPS method produced a linear piece-wise regression (R = 0.97) between protein backbone descriptors and tm values for alanine-mutants of Arc repressor. A break-point value of 51.87oC characterized two mutants’ clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such a folding process, indicating that topologic/topographic protein’s backbone interactions control the stability profile of wild-type Arc and its alanine-mutants.
Keywords: Protein Stability, Arc Repressor, Alanine-Substitution Mutant, <i>TOMOCOMD-CAMPS</i> Software, Protein Linear Indices, QSAR