Recently, numerous models have been developed to predict drug interactions with molecules. However, integrating diverse data sources and improving the accuracy of biological activity predictions remains a challenge in cheminformatics and drug discovery.
In this work, we present a machine learning-based approach for predicting the assay efficacy of molecules targeting calmodulin (CaM) pathway proteins. To build and evaluate the model, we assembled a diverse dataset involving approved drugs and experimental compounds reported to interact with CaM complexes. Our approach is based on the IFPTML (Information Fusion–Perturbation Theory–Machine Learning) methodology, implemented using the XGBoost algorithm. This strategy enables the integration of structural, physicochemical, and biological information into a unified predictive system capable of handling complex biochemical interactions.
The resulting model demonstrated strong predictive performance, achieving 89.1% accuracy and 89.0% sensitivity on an external test set. Feature importance analysis revealed critical molecular descriptors and assay conditions that contribute significantly to biological activity, providing insights into the key factors that influence compound efficacy.
To further assess the model’s utility, we applied it in a virtual screening scenario using a set of riluzole derivatives. The model correctly identified those derivatives predicted to be more bioactive than riluzole under specific assay conditions, highlighting its value in rational drug design.
Overall, this study introduces a scalable and data-driven computational framework for supporting early-stage discovery of bioactive compounds in CaM-related pathways.
