Home » MOL2NET-1 » Section e: Statistics, Artificial Intelligence, Data Science, Complex Networks Analysis » Paper e002

[e002] Pairwise Ortholog Detection in Related Yeast Species by Using Big Data Supervised Classifications

1 Departamento de Ciencias de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara, 54830, Cuba
2 Dept. of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Granada
3 Centro de Bioactivos Químicos, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara, 54830, Cuba
4 CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal
5 Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
* Author to whom correspondence should be addressed.
2 December 2015
356 views
0/5 rated ( 0 ratings )

Abstract

Orthology detection still requires more effective scaling algorithms. Combinations of alignment, synteny, evolutionary distances and protein interactions have been used in different unsupervised algorithms to improve effectiveness while many available databases are concerned with the scaling problem. In this paper, a set of gene pair features based on similarity measures, such as alignment scores, sequence length, gene membership to conserved regions and physicochemical profiles are combined in a supervised Pairwise Ortholog Detection (POD) approach to improve effectiveness considering low ortholog ratios in relation to all possible pairwise comparisons between two genomes. In this POD scenario, big data supervised classifiers managing imbalance between ortholog and non-ortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs.

The supervised approach for POD was compared with Reciprocal Best Hits (RBH), Reciprocal Smallest Distance (RSD) and a Comprehensive, Automated Project for the Identification of Orthologs from Complete Genome Data (OMA) algorithms by using (i) Saccharomyces cerevisiae - Kluyveromcyes lactis, (ii) Saccharomyces cerevisiae - Candida glabrata and (iii) Saccharomyces cerevisiae - Schizosaccharomyces pombe yeast genome pairs as benchmark datasets. Four datasets derived from each genome pair comparison with different alignment settings were used. Because of the large amount of instances (gene pairs) and the data imbalance, the building and testing of the supervised model was only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark Support Vector Machines outperformed RBH, RSD and OMA, probably, because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

Keywords

ortholog detection; big data supervised classification; similarity measures

Cite this article as

Galpert Cañizares, D.; del Río García, S.; Herrera, F.; Ancede Gallardo, E.; Antunes, A.; Agüero-Chapin, G. Pairwise Ortholog Detection in Related Yeast Species by Using Big Data Supervised Classifications. In Proceedings of the MOL2NET, 5–15 December 2015; Sciforum Electronic Conference Series, Vol. 1, 2015 , e002; doi:10.3390/MOL2NET-1-e002

Author biographies

Francisco Herrera
Full Professor and Head of Research Group SCI2S (Soft Computing and Intelligent Information Systems) at the University of Granada, Spain
Guillermin Agüero-Chapin
1998- University Degree in Pharmacy at the Universidad Central de Las Villas, Santa Clara, Cuba. 2009- MSc degree in General Biochemistry at the Medicine School “Dr Zerafin Ruiz de Zarate Ruiz”, Villa Clara, Cuba 2013- PhD in Biology awarded with honours by the Universidade do Porto, Portugal. Current Position- Postdoctoral Researcher at CIIMAR, Universidade do Porto, Portugal.

Comments on Pairwise Ortholog Detection in Related Yeast Species by Using Big Data Supervised Classifications

Sponsors