Please login first
SeqDivA: Sequence Diversity Analysis Tool for Detecting the Twilight Zone of Alignment Algorithms
* 1, 2 , 3
1  CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.
2  Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
3  Facultad de Ciencias Exactas, Universidad Andrés Bello, República 275, Santiago, Chile

https://doi.org/10.3390/mol2net-06-06883 (registering DOI)
Abstract:

Looking into the literature and scientific forums, there isn’t any software that can explore the diversity of a database or a sequence subset by applying the similarity measures reported to delimit the twilight zone according all previously mentioned thresholds. So far, in order to retrieve several similarity measures like identity, similarity and scores in an all-vs-all pairwise sequence comparison, users should run previously software like needle (global alignment), water (local alignment), blast (local alignment) and even multiple sequence alignments (MSAs) tools (http://imed.med.ucm.es/Tools/sias.html), then results should be parsed to be presented in a nxn matrix. However, going through all these steps to get at the final similarity matrix require programming skills.

Here, we present SeqDivA, a python-based tool with a friendly GUI allowing non-expert users to run alignment algorithms (water, needle and blast) to compare all vs all protein, DNA and RNA sequences. SeqDivA provides similarity, identity and bit-score matrixes to explore the diversity/homology of the sequences, enabling the delimitation of the twilight zone. The resulting matrixes are visualized using dot plot-like graphs representing pairwise similarity measures (identities, similarity and bit-scores). SeqDivA also allows redundancy reduction by exploring amino acid identities from global alignments and can be connected to the output of software simulating related sequences with a known evolutionary history i.e. ROSE [1] and INDELible [2] in order to get subsets of homologous sequences at different identities or bit-scores ranges. The software can be freely downloaded at https://github.com/eancedeg/SeqDivA. The software was published as part of the paper published at https://doi.org/10.3390/biom10010026

1- Stoye, J., D. Evers, and F. Meyer, Rose: generating sequence families. Bioinformatics, 1998. 14(2): p. 157-163.

2- Fletcher, W. and Z. Yang, INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol, 2009. 26(8): p. 1879-88

Keywords: alignment algorithm; twilight zone; sequence diversity
Comments on this paper
Humbert G. Díaz
Sequence Alignment & Machine Learning
Dear Dr Aguero and Ancede, thank you for your contribution. What are in your opinion the future directions on Twilight Zone of Alignment Algorithms? Is there possibility of future advances applying Machine Learning and/or Artificial Intelligence techniques?

Guillermin Agüero-Chapin
Twilight Zone and Machine Learning Models
Dear Prof. Gonzalez-Díaz
The accurate definition of the twilight zone for the alignment algorithms will be an important step for the application of alternative methodologies for homology detection. In this sense, machine learning-based models are useful tools for remote homology detection at the twilight zone where alignment-based algorithms start failing. Artificial Intelligence will be playing a crucial role to address many bioinformatics problems, not solved for the standar methodologies
Thanks for your comments

Humbert G. Díaz
Sart up
Thank you very much for supporting mol2net conference. You are invited also to participate on molnet'2021 edition now open: https://mol2net-07.sciforum.net/


Is there a market niche for an spin off or start up launching this a software based on this kind of models?

Have you ever considered to become an entrepreneur adventurer?
Guillermin Agüero-Chapin
Hi Prof. Humb. Glez-Díaz

Thanks for your interest in the software.

Currenlty, the software is public available for academic purposes. Probably in the future we can incorporate some improvements to manage genomic and transcriptomic data (big data), thus its use would deserve an stepforward to incoporated in a pipeline analyzing genomic/trasncriptomic and proteomic data We would like to found an small enterprise :)



 
 
Top