Please login first
Alignment-free Prediction of Ribonucleases using a Computational Chemistry approach: Comparison with HMM model and Isolation from Schizosaccharomyces pombe, Prediction, and Experimental assay of a new sequence
* 1, 2 , 1, 3, 4 , 5 , 6 , 2 , 1 , 7
1  Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, 09124, Italy
2  CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara, 54830, Cuba
3  Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy, Faculty of Pharmacy, USC, Santiago de Compostela, 15782, Spain
4  Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela, 15782, Spain
5  CINVESTAV–LANGEBIO, Irapuato, Guanajuato, 36500, México
6  Caribbean vitroplants, Santo Domingo, 1464, Dominican Republic
7  Vascular Biology Institute, School of Medicine, University of Miami, Florida, 33136, USA

Abstract: The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics methods based on sequence alignment may fail when there is a low amino acidic identity percentage between query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative Structure-Activity Relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences QSAR like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the Spectral Moments of a Markov Matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic Hidden Markov Model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe, strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as otherclassical alignment methods. Experimental assay of this protein confirms the predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.
Keywords: Spectral graph theory / Hidden Markov Model / Ribonucleases / Pac1 / Protein 2D representations