The computational resources required by atomistic simulations of biomolecular systems still limit their applicability to relatively short time and length scales, at odds with those typically characterising biological processes. By integrating out most of the microscopic degrees of freedom in favor of a description in terms of few sites interacting through effective potentials, coarse-grained (CG) models constitute a powerful instrument for broadening the class of accessible phenomena, at the same time providing accurate results [1]. Also an exact CG procedure, however, inherently comes at a price: a loss of information, quantified by an increase in entropy, arising when a system is observed through "CG glasses" [2]. Interestingly, this loss only depends on the mapping, i.e., the sites one employs to represent the system at the CG level, which are often a priori selected only based on physical intuition [3].
Several questions follow: how wide and diverse is the space of possible CG mappings of a biomolecule? Within this space, are there representations that minimise the information loss, and do these "privileged" mappings give hints on the underlying biological processes? In this work, we address these topics by first characterising the space of CG representations of a system through the definition of a distance between mappings. Subsequently, we develop a workflow enabling to estimate the increase in entropy of a protein arising from CG'ing. Finally, we show that minimising this quantity over the space of possible CG representations suggests a connection between the biological relevance of a chemical fragment composing the biomolecule and the amount of information it contains [4].
[1] R. Menichetti, A. Pelissetto and F. Randisi, J. Chem. Phys. 146, 244908 (2017).
[2] J. F. Rudzinski and W. G. Noid, J. Chem. Phys. 135, 214101 (2011).
[3] P. Diggins IV et al., J. Chem. Theory Comput. 15, 648 (2019).
[4] M. Giulini, et al., J. Chem. Theory Comput. 16, 6795 (2020).