Methods for pairwise ortholog detection (POD) usually relies on alignment-based (AB) similarity measures. However, AB algorithms are still limited in POD since they may fail in the presence of certain evolutionary and genetic events. In this sense, POD is an open field in bioinformatics demanding either constant improvements in existing methods or new effective scaling algorithms to deal with Big Data.
In a previous paper, we developed a Big Data supervised POD approach considering several AB pairwise gene features and the low ortholog pair ratios found between two proteomes (Galpert, del Río et al. 2015). Although the higher sensitivity achieved for our supervised POD models in relation to classical POD methodologies, when were comparatively evaluated on the Saccharomycete yeast benchmark dataset (Salichos and Rokas 2011); they were implemented in MapReduce framework and tested on a single yeast genome pair.
In (Galpert, Fernández et al. 2018) (https://doi.org/10.1186/s12859-018-2148-8), we propose some improvements to our supervised POD approach by i) surveying the incorporation of alignment-free pairwise similarity measures ii) evaluating other classifiers under the Big Data Spark platform and iii) extending the test set to other related Saccharomycete yeast proteomes.