Please login first
Entropy Based Computational Identification of Genomic Markers for Human Papillomavirus Detection and Genotyping
, , *
1  Department of Biology, Center for Biological and Health Sciences, Federal University of Sergipe, Brazil


Papillomavirus are circular double-stranded DNA viruses that specifically infect the skin epithelium and mucocutaneous of mammals, reptiles and birds causing asymptomatic infections, benign and malignant lesions. The discovery of new viral types in the Papillomaviridae family is very relevant since they have different pathological characteristics. The classification of papillomaviruses is based on L1 gene sequence identity. However, several studies on Human papillomavirus (HPV) diversity make use of only 450 bp fragment in L1 gene in order to classify novel HPV types, subtypes, and variants. It has been observed that this L1 fragment is not appropriated for detection and genotyping based on molecular biology methods and topological and statistical aspects of phylogenetic tree. So, the identification of novel genomic markers is relevant to develop more effective diagnostic methods. Therefore, the aim of this study was to develop and apply a novel computational tool based on entropy in order to identify phylogenetic informative genomic regions that could be used as markers for the detection and genotyping of HPV. In order to develop the method, a comparative analysis was performed to assess the genetic variability of L1 gene sequences from Alphapapillomavirus, Betapapillomavirus and Gammapapillomavirus genera. Shannon entropy was calculated for each site in L1 sequence alignment. Informative sites were identified by using a cutoff of 1.0 bits of information. Phylogenetic trees were constructed based on those informative sites with maximum likelihood method. The tree topology and bootstrap values were compared. Once the markers were identified, the method uses the entropy measure to determine the best genomic regions to establish degenerate primers. The locations of forward and reverse primers are established around the selected region, sorted by their entropy values. The results showed that it was possible to identify regions in HPV genome that provide robust phylogenetic topologies, and good statistical support. Simulations showed that the primers were capable of detecting several HPV types. In order to confirm their efficacy, the primers were tested experimentally and they successfully detected HPV DNA. So, the entropy measure presented itself as a good approach to identify phylogenetic informative genomic regions, which is important to correctly position novel HPV types in a phylogenetic tree, relevant to genotype these viruses. In addition, the entropy based method could efficiently design degenerate primers that are able to amplify phylogenetic informative regions, increasing the sensitivity and specificity of the HPV diagnosis.

Keywords: Human papillomavirus; molecular detection; entropy