Please login first
Comprehensive Workflow for Bacterial Genome Analysis: From SRA Raw Reads to Reference Mapping and Annotation
* , ,
1  Department of Biotechnology and Genetic Engineering, Islamic University, Kushtia, Bangladesh
Academic Editor: Oswaldo Palenzuela

Published: 05 February 2026 by MDPI in The 1st International Online Conference on Biology session Infection Biology
Abstract:

Introduction
Bacterial genomics has been transformed by next-generation sequencing (NGS), enabling faster, large-scale, and cost-effective analyses. However, the complexity of computational tools and multi-step processing pipelines often
poses difficulties for researchers with limited bioinformatics expertise. To address this, we created a modular bioinformatics pipeline optimized for Illumina bacterial genomes. The workflow combines essential steps, including quality control, trimming, read mapping, variant calling, de novo assembly, annotation, and visualization into an automated, reproducible, and user-friendly pipeline.

Methods
Raw sequencing reads of Pseudomonas aeruginosa strain 2025SY-00129 (SRR33893847) were retrieved from NCBI SRA using SRA-Toolkit. Quality control and trimming were performed using FastQC, MultiQC, and Trimmomatic. BWA was used to map the reads to the Pseudomonas aeruginosa strain-2507 reference, and FreeBayes was used to identify variations. De novo assembly was generated with SPAdes, assessed using QUAST, and annotated with Prokka. All processes were carried out using automated shell scripts in Conda environments to ensure reproducibility.

Results
The workflow produced a high-quality Pseudomonas aeruginosa draft genome of approximately 6.93 Mb, comprising 193 contigs, with robust assembly statistics (N50 = 298,141 bp, GC content = 66.11%, and average coverage depth = 29x) indicating high completeness. The genome annotation provided a thorough insight into the genomic architecture by identifying 6,379 coding sequences, including 2,767 hypothetical proteins, 68 tRNAs, one tmRNA, and one CRISPR array. The workflow also produced comprehensive outputs, including alignment summaries, variant density plots, and coverage maps, demonstrating its efficiency, strength, and reproducibility for bacterial genome analysis.­

Conclusion
This modulated workflow offers a reproducible approach to bacterial genome analysis with open-source tools. It offers the generation of accurate genome assemblies and annotations, providing a solid basis for downstream analysis of bacterial genomes. The complete workflow and documentation are available at https://github.com/mdarsikdar/bacterial-upstream-analysis.


Keywords: NGS; Genomics; Workflow; Bacteria

 
 
Top