Introduction: Efficient cancer risk assessment is vital for sustainable practices in pharma, agriculture, and environmental protection. Traditional animal tests for chemical carcinogenicity are time- and cost-consuming. Ongoing efforts focus on pioneering alternative approaches to improve accessibility and reliability in cancer risk assessment.
Objectives: This study aimed to develop a scoring function that can rank chemical compounds based on their potential human carcinogenicity through in silico methods.
Materials & methods: An ensemble of diverse AI/ML models, including Boosting Machines, Graph Neural Networks, and Large Language Models was used to predict endpoints associated with carcinogenicity, including in vitro mutagenicity, in vitro and in vivo clastogenicity/aneugenicity, and rodent carcinogenicity. The risk-score function was developed by applying a weighting strategy to every endpoint. The datasets of human carcinogenic and non-carcinogenic chemicals were used to evaluate the performance of risk-score, the p-value was estimated to indicate the significance of the difference.
Results: It was shown that the mean risk-score values differ significantly (p<0.0001) between human carcinogens and non-carcinogens. The accuracy to predict human carcinogens was 73%, which was slightly lower than the 76% accuracy achieved in experimental carcinogenicity studies in mice, while significantly surpassing the 65% accuracy obtained in studies with rats.
Conclusion: The devised risk score evaluates the potential of chemicals to induce cancer in humans in silico by integrating information from diverse cancer-related test results, providing an approach nearly as accurate as in vivo experiments. Due to its speed and efficiency, the developed approach can effectively be employed for screening of large quantities of chemicals. The developed risk score focuses on genotoxic carcinogens. It is anticipated to enhance the versatility and applicability of the approach through the inclusion of additional endpoints associated with non-genotoxic carcinogenesis, as well as by implementing more sophisticated AI/ML technologies, such as multi-task learning.