Please login first
Surveying Alignment-free Features for Ortholog Detection in Related Yeast Proteomes by using Supervised Big Data Classifiers
1 , 2 , 2 , 3, 4 , 5 , * 4, 6
1  Departamento de Ciencias de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara, 54830, Cuba
2  Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
3  CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal
4  Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
5  Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), Santa Clara, 54830, Cuba
6  CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.


Methods for pairwise ortholog detection (POD) usually relies on alignment-based (AB) similarity measures. However, AB algorithms are still limited in POD since they may fail in the presence of certain evolutionary and genetic events. In this sense, POD is an open field in bioinformatics demanding either constant improvements in existing methods or new effective scaling algorithms to deal with Big Data.

In a previous paper, we developed a Big Data supervised POD approach considering several AB pairwise gene features and the low ortholog pair ratios found between two proteomes (Galpert, del Río et al. 2015). Although the higher sensitivity achieved for our supervised POD models in relation to classical POD methodologies, when were comparatively evaluated on the Saccharomycete yeast benchmark dataset (Salichos and Rokas 2011); they were implemented in MapReduce framework and tested on a single yeast genome pair.

In (Galpert, Fernández et al. 2018) (, we propose some improvements to our supervised POD approach by i) surveying the incorporation of alignment-free pairwise similarity measures ii) evaluating other classifiers under the Big Data Spark platform and iii) extending the test set to other related Saccharomycete yeast proteomes.

Keywords: Pairwise ortholog detection; Alignment-free similarity measures; Big data; Supervised classification; Yeast