Data compression is an important resource used to deal with the increasing size of files. Many genome analysis projects are currently being developed, especially with the advent of the Human Genome Project. With the large amount of information generated through genetic sequencing as well as the need for its processing and analysis, new technologies and algorithms need to be developed frequently. This work proposes and evaluates an implementation of different algorithms of data compression using the Python programming language allied to the use of threads. As a case study it was used genomic data available from the NCBI (National Center for Biotechnology Information) public database. The results show a compression ratio of approximately 40% in the size of the files after the application of the LZW (Lempel-Ziv-Welch) algorithm, thus presenting considerable superiority over the BW (Burrows-Wheeler) algorithm.
Previous Article in event
Next Article in event
Data compression with Python: application of different algorithms with the use of threads in genome files
Published: 01 December 2017 by MDPI in MOL2NET'17, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 3rd ed. congress USEDAT-03: USA-EU Data Analysis Training Prog. Work., Cambridge, UK-Bilbao, Spain-Duluth, USA, 2017
Keywords: Data compression, Python, Genome files