Previous Article in event
Next Article in event
Protein Linear Indices in Bioinformatics Studies: 1. Prediction of Protein Stability Effects of a Complete Set of Alanine Substitutions in Arc Repressor
Published:
30 November 2006
by MDPI
in The 10th International Electronic Conference on Synthetic Organic Chemistry
session Bioorganic Chemistry and Natural Products
Abstract: A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. Protein’s total (whole-protein) and local (one or more amino-acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on ℜn[k (xmi): ℜn→ℜn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph’s α-carbon atom adjacency matrix. Total linear indices are linear functional on ℜn. That is, the kth total linear indices are a linear maps from ℜn to the scalar ℜ[k(xm):ℜn→ℜ]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino-acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine-mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It show a high Matthews´ correlation coefficient (MCC = 0.952) for the training set and a MCC = 0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc = 0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine-mutant. On the other hand, linear piecewise regression model compared favorably with respect to linear regression one on predicting the melting temperature (tm) of the Arc alanine-mutants. The linear model explains almost 81% of the variance of the experimental tm (R = 0.90 and s = 4.29) and the LOO press statistics evidenced its predictive ability (q2 = 0.72 and scv = 4.79). Moreover, TOMOCOMD-CAMPS method produced a linear piece-wise regression (R = 0.97) between protein backbone descriptors and tm values for alanine-mutants of Arc repressor. A break-point value of 51.87oC characterized two mutants’ clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such a folding process, indicating that topologic/topographic protein’s backbone interactions control the stability profile of wild-type Arc and its alanine-mutants.
Keywords: Protein Stability, Arc Repressor, Alanine-Substitution Mutant, <i>TOMOCOMD-CAMPS</i> Software, Protein Linear Indices, QSAR