Please login first
A proposed new measure to verify the general version of Chargaff 2nd rule.
* 1 , 1 , 2 , 3
1  Department of Bioinformatics, University of Talca, 2 Norte, 685, Talca, Chile
2  Department of Computer Science, University of Chile, Beaucheff 850, Santiago, Chile
3  Institute of Mathematics and Physics, University of Talca, Av. Lircay, s/n, Talca, Chile


In the 40’s, Erwin Chargaff was the first to observe the parity between Adenines (A) and Timinies (T) and Citosines (C) and Guanines (G), in the molecule of DNA. In the 60’s, Chargaff found a second parity rule. This time in a single strand of DNA. The amounts of A’s and T’s, and the amounts of C’s and G’s is similar. The explanation of the first rule is the complementary nature of the double stranded helix of the DNA molecule. However, for the 2nd rule, a biological explanation has remained a mystery. In the last 40 years, a generalization of the second rule was proposed, to explain the 2nd rule as a particular case. This generalization states that for any given k-mer and its reverse complement (RC), the number of times both are found is similar in a single strand of DNA. Two measures have been proposed to test the generalized Chargaff’s 2nd rule (gC2r), both include an artifact regarding the length of the genomes. This has led the authors to think there is a minimum length of a genome and a maximum k-mer for compliance. We propose a new way to measure the compliance of any given genome to the gC2r. The measure is the proportion of the genome which complies with gC2r. The compliance is measured per pair of kmer/k-merRC, using the natural logarithm of the number of times the k-mer is found, divided by the number of times its reverse complement is found in the genome or ln(#k-mer/k-merRC). This measure is independent of the size of the analyzed k-mer and the size of the genome. This measure has been implemented in a software, ChargaffCracker, which can rapidly analyze sequences and deliver a statistical report. We have generated random genomes based on the proportions and lengths of biological prokaryote genome sequences and compared them. We conclude hypothesizing that: 1. The compliance of the gC2r is a consequence, not cause of the 2nd rule and; 2. Although Chargaff’s 2nd rule might be a consequence of transpositions and inversions, the limits of compliance of the gC2r is a property of the sequence model of genomes, not of the biology of organisms. However, this property might have been selected to fulfill biological needs in genome evolution.

Keywords: Chargaff, kmers,bioinformatics,sequence,