Please login first
TOMOCOMD-CAMPS and Protein Bilinear Indices: Novel Bio-Macromolecular Descriptors for Protein Research. I. Predicting Protein Stability Effects of a Complete Set of Alanine Substitutions in Arc Repressor
* 1, 2, 3 , 1, 4 , 1 , 2 , 3
1  Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy. Central University of Las Villas, Santa Clara, 54830, Villa Clara, Cuba
2  Institut Universitari de Ciència Molecular, Universitat de València, Edifici d'Instituts de Paterna, Poligon la Coma s/n, E-46071 Valencia, Spain
3  Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València, Spain
4  Department of Physiology, Medical School “Faustino Pérez Hernández”, Km # 3 Circumvallation, Sancti-Spíritus, Cuba

Abstract: A new set of amino-acid based bio-macromolecular descriptors support on a bilinear map are presented. This novel approach to bio-macromolecular design from a linear algebra point of view is relevant to protein QSAR/QSPR studies. These biochemical descriptors are based on the computation of bilinear maps on R n [ ( , ) mk m m b x y : R n x R n ? R ] in canonical basis. Protein’s bilinear indices are calculated from kth power of non-stochastic and stochastic graph–theoretic electroniccontact matrices, km M and km sM , respectively. That is to say, the kth non-stochastic and stochastic protein’s bilinear indices are calculated using km M and km sM as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of amino-acid properties as weightings (z-values, sidechain isotropic surface area (ISA), amino-acids atomic charges (ECI) and hydrophathy index (Kyte-Doolittle scale; HPI). Quantitative models that discriminate near wild-type stability alanine-mutants from the reduced-stability ones in training and test series were obtained. Non-stochastic and stochastic equations permitted the correct classification of 100% (41/41) and 97.56% (40/41) of proteins in the training set, respectively. Correct classification in test sets were 91.67% for both models. In order to predict Arc alaninemutant’s melting temperature (tm), lineal regression models were developed. The linear model obtained by using non-stochastic bilinear indices explains almost 84% of the variance of the experimental tm (R = 0.91 and s = 4.50oC) as long as the stochastic bilinear indices-based equation describe 81% of the tm variance (R = 0.90 and s = 5.01oC). The Leave-one-our press statistics, evidenced high predictive ability of both models (q2 = 0.73 and scv = 4.50 oC for non-stochastic and q2 = 0.64 and scv = 5.01 oC for stochastic bilinear indices). Moreover, non-stochastic and stochastic protein’s bilinear indices produced rather linear piecewise regressions (R of 0.95 and 0.96, correspondingly) between protein-backbone descriptors and tm values for alaninemutants of Arc repressor. Both obtained break-point values were 51.87oC and characterized two mutant’s clusters as well as coincided perfectly with the experimental scale. Therefore, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. Protein’s bilinear indices models compared favorably with several bio-macromolecular descriptors previously reported. These models also permitted the interpretation of the driving forces of such a folding process, indicating that topologic/topographic protein’s backbone interactions control the stability profile of wild-type Arc and its alaninemutants.
Keywords: Protein Stability, Arc Repressor, Alanine-Substitution Mutant, TOMOCOMD-CAMPS Software, Bilinear Indices, QSAR, Linear Discriminant Analysis, Linear Multiple Regression, Piecewise Linear Regression