SeqDivA: Sequence Diversity Analysis Tool for Detecting the Twilight Zone of Alignment Algorithms

Guillermin Agüero-Chapin; Evys Ancede Gallardo

Previous Article in event

Identification and characterization of Bacillus altitudinis strain KA15 isolated from the Djurdjura Mountains in Kabylia, Algeria

Previous Article in congress

Promising activity of Cissus incisa extracts against Mycobacterium Tuberculosis H37RV strain

Next Article in event

Rosemary (Rosmarinus officinalis) against Streptococcus mutans adhesins

Next Article in congress

Identification of a novel acido-thermostable chitinase from Bacillus altitudinis strain KA15

SeqDivA: Sequence Diversity Analysis Tool for Detecting the Twilight Zone of Alignment Algorithms

Guillermin Agüero-Chapin

^{*

1, 2},

Evys Ancede Gallardo

¹ CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.
² Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
³ Facultad de Ciencias Exactas, Universidad Andrés Bello, República 275, Santiago, Chile

Published: 20 July 2020 by MDPI in MOL2NET'20, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 6th ed. congress CHEMBIOMOL-06: Chem. Biol. & Med. Chem. Workshop, Bilbao-Rostock, Germany-Galveston, Texas, USA, 2020

https://doi.org/10.3390/mol2net-06-06883 (registering DOI)

Abstract:

Looking into the literature and scientific forums, there isn’t any software that can explore the diversity of a database or a sequence subset by applying the similarity measures reported to delimit the twilight zone according all previously mentioned thresholds. So far, in order to retrieve several similarity measures like identity, similarity and scores in an all-vs-all pairwise sequence comparison, users should run previously software like needle (global alignment), water (local alignment), blast (local alignment) and even multiple sequence alignments (MSAs) tools (http://imed.med.ucm.es/Tools/sias.html), then results should be parsed to be presented in a nxn matrix. However, going through all these steps to get at the final similarity matrix require programming skills.

Here, we present SeqDivA, a python-based tool with a friendly GUI allowing non-expert users to run alignment algorithms (water, needle and blast) to compare all vs all protein, DNA and RNA sequences. SeqDivA provides similarity, identity and bit-score matrixes to explore the diversity/homology of the sequences, enabling the delimitation of the twilight zone. The resulting matrixes are visualized using dot plot-like graphs representing pairwise similarity measures (identities, similarity and bit-scores). SeqDivA also allows redundancy reduction by exploring amino acid identities from global alignments and can be connected to the output of software simulating related sequences with a known evolutionary history i.e. ROSE [1] and INDELible [2] in order to get subsets of homologous sequences at different identities or bit-scores ranges. The software can be freely downloaded at https://github.com/eancedeg/SeqDivA. The software was published as part of the paper published at https://doi.org/10.3390/biom10010026

1- Stoye, J., D. Evers, and F. Meyer, Rose: generating sequence families. Bioinformatics, 1998. 14(2): p. 157-163.

2- Fletcher, W. and Z. Yang, INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol, 2009. 26(8): p. 1879-88

Keywords: alignment algorithm; twilight zone; sequence diversity

View paper

81 Reads
0 Recommendations

Comments on this paper

Humbert G. Díaz

3 January 2021

Sequence Alignment & Machine Learning

Dear Dr Aguero and Ancede, thank you for your contribution. What are in your opinion the future directions on Twilight Zone of Alignment Algorithms? Is there possibility of future advances applying Machine Learning and/or Artificial Intelligence techniques?

Guillermin Agüero-Chapin

4 January 2021

Twilight Zone and Machine Learning Models

Dear Prof. Gonzalez-Díaz
The accurate definition of the twilight zone for the alignment algorithms will be an important step for the application of alternative methodologies for homology detection. In this sense, machine learning-based models are useful tools for remote homology detection at the twilight zone where alignment-based algorithms start failing. Artificial Intelligence will be playing a crucial role to address many bioinformatics problems, not solved for the standar methodologies
Thanks for your comments

Humbert G. Díaz

30 January 2021

Sart up

Thank you very much for supporting mol2net conference. You are invited also to participate on molnet'2021 edition now open: https://mol2net-07.sciforum.net/

Is there a market niche for an spin off or start up launching this a software based on this kind of models?

Have you ever considered to become an entrepreneur adventurer?

Guillermin Agüero-Chapin

1 February 2021

Hi Prof. Humb. Glez-Díaz

Thanks for your interest in the software.

Currenlty, the software is public available for academic purposes. Probably in the future we can incorporate some improvements to manage genomic and transcriptomic data (big data), thus its use would deserve an stepforward to incoporated in a pipeline analyzing genomic/trasncriptomic and proteomic data We would like to found an small enterprise :)

Guillermin Agüero-Chapin

Evys Ancede Gallardo