Please login first
Trends in genome composition across the transition to complex multicellularity
* , , ,
1  Foundation for the Promotion of Sanitary and Biomedical Research of the Valencian Community (FISABIO), 46020 Valencia, Spain
2  Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish National Research Council (CSIC), 46980 Valencia, Spain
3  Center for Biomedical Research in Epidemiology and Public Health Network (CIBEResp), 28029 Madrid, Spain
Academic Editor: Angelo Fortunato

Published: 05 February 2026 by MDPI in The 1st International Online Conference on Biology session Evolutionary Biology
Abstract:

Genome architecture reorganized over evolutionary time to support complex multicellularity without a proportional increase in coding DNA. We conducted a cross-kingdom comparative analysis using high-quality RefSeq assemblies annotated by the NCBI Genome Annotation Pipeline, restricting the dataset to chromosome-level or complete genomes. We first computed key genomic variables for bacteria, archaea, and 694 eukaryotes (including 133 mammals, 77 birds, 169 fish, 187 arthropods, 128 plants, 130 fungi, and 53 unicellular eukaryotes), including genome size, gene content, and coding DNA content. Next, we fit scaling relationships among genomic variables and used PGLS to account for shared ancestry. Finally, we modeled the global trend using the mathematical form that best described these relationships. We identified clear regime shifts in composition. A transition near 40 Mb of gene content marked the shift from prokaryotes to multicellular lineages, beyond which coding DNA content scaled sublinearly with gene content and approached saturation. Prokaryotes exhibited near-proportional increases in the coding sequence with genome size, whereas eukaryotes showed progressive decoupling as the noncoding sequence expanded. Plants followed a distinct path, with gene content increasing sublinearly relative to genome size. Vertebrates, particularly mammals and birds, occupied compressed ranges of genome size and gene content consistent with strong compositional constraints; in these clades, coding DNA constituted as little as the 3% of total genic sequence. Together, these results revealed robust scaling laws and thresholds governing genome composition across the tree of life, quantifying how the expansion of the noncoding sequence dominated genomic evolution in the transition to complex multicellularity. On the one side, we computed summary statistics for the percentage of gene content relative to genome size, the percentage of coding relative to gene size, the percentage of coding relative to genome size, and the alternative splicing ratio across different taxonomic groups. On the other side, we also fitted a mathematical model that relates to the different variables.

Keywords: Genome architecture; Scaling laws; Complex multicellularity

 
 
Top