Co-evolution importance on binding Hot-Spot prediction methods

¹ CNC - Center for Neuroscience and Cell Biology; Rua Larga, FMUC, Polo I, 1ºandar, Universidade de Coimbra, 3004-517; Coimbra, Portugal
² Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, Estrada Nacional 10 (ao km 139,7), 2695-066 Bobadela LRS, Portugal
³ Department of Genetics and Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
⁴ Department of Mathematics, Faculdade de Ciencias, Universidade do Porto, Portugal
⁵ Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Utrecht 3584CH, The Netherlands

Published: 21 January 2017 by MDPI in MOL2NET'16, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 2nd ed. congress USEDAT-02: USA-Europe Data Analysis Training Program Workshop, Cambridge, UK-Bilbao, Spain-Miami, USA, 2016

https://doi.org/10.3390/mol2net-02-03889

Abstract:

Protein-protein interactions (PPIs) have proven necessary for the majority of biological processes, making their understanding vital for the development of new therapies and techniques in life sciences research. Among the residues that constitute a typical protein-protein interface, Hot-Spots (HS) are the most important ones due to their highly stabilizing nature. However, HS experimental detection has proven to be a burden as it is time consuming and expensive, which prompted the need to develop new computational approaches that ensure both speed and precision. Evolution plays a major role in protein structure and PPIs refinement, and therefore the incorporation of such data into a predictive model may lead to better performance. With this in mind and taking into account the data already available from alanine scanning mutagenesis studies and protein structures, we incorporated several structure- (i.e. solvent accessible surface area-related values, sequence- (i.e. position-specific scoring matrix), and evolutionary-based (i.e. InterEVScore and CoeViz) features into a predictive machine-learning classification model. We considered six different pre-processing conditions such as Principal Component Analysis (PCA) and z-scoring (scaling) with normal, up- and down-sampling of minor and major classes. Our results point towards overall better scores when using more evolutionary features, in particular EVFold scores.

Keywords: Protein-protein interactions; Principal Component Analysis (PCA); Protein-protein interface; Hot-Spots (HS)

View Poster

136 Reads