CIF descriptors are based on the concept of Continuous Indicator Fields (CIF),1 a particular case of Continuous Molecular Fields.2,3 Each CIF descriptor is defined by an isotropic Gaussian function centered at a specific point in the physical space. The positions of these points can be chosen by applying hierarchical cluster analysis to Cartesian coordinates of all atoms in all molecules in the aligned training set. The value of a CIF descriptor for a molecule is equal to the overlap integral between this function and the sum of analogous Gaussian functions centered on all atoms in the molecule. The resulting matrix of CIF descriptors can be used to build 3D QSAR models.
There are several advantages of using CIF descriptors over the original methodology of building CIF 3D QSAR models.1 Firstly, CIF descriptors can efficiently be computed for big data sets. Secondly, any machine learning method, regression or classification; linear or non-linear, can be applied to build 3D QSAR models. Thirdly, CIF descriptors can be aggregated to form 3D analogs of fragment descriptors, which can be used to interpret 3D QSAR models from structural viewpoint.
CIF descriptors are implemented in R scripts and available as a part of the Continuous Molecular Fields project.4 They were used in conjunction with Support Vector Machines and several other machine learning methods to build 3D QSAR models for several benchmarking data sets.
References
- Sitnikov G.V.; Zhokhova N.I.; Ustynyuk Yu.A.; Varnek A.; Baskin I.I. J. Comput. Aided Mol. Des. 2015, 29, 233.
- Baskin I.I.; Zhokhova N.I. J. Comput. Aided Mol. Des. 2013, 27, 427.
- Baskin I.I.; Zhokhova N.I. Challenges and Advances in Computational Chemistry and Physics, 2014, Springer, 17, 433.
- http://sites.google.com/sites/conmolfields/