Please login first
A Systematic Implementation of Machine Learning Algorithms for Multifaceted Antimicrobial Screening of Lead Compounds
* 1 , 2
1  School of Engineering, Stanford University
2  School of Engineering and Applied Science, University of Pennsylvania
Academic Editor: Manuel Simões

Abstract:

Antibiotic research efforts for the identification of effective antibiotics can be accelerated through computational screening tools that can accurately identify novel, antimicrobial compounds. This study employed machine learning algorithms to identify lead compounds that inhibit the antibiotic targets, DNA gyrase and Dihydrofolate reductase in Escherichia coli, and identified new, multifaceted antimicrobial compounds. This study used three separate datasets: 1) 326 Escherichia coli DNA gyrase inhibitors and 132 non-inhibitors, 2) 346 Escherichia coli Dihydrofolate reductase inhibitors and 176 non-inhibitors, and 3) 18387 non-specific drug-like chemicals. All datasets were then processed using ECFP-4 fingerprints and split into train, test, and validation datasets according to a 70-15-15 train-test-validation split. We explored the potential of 6 different classification algorithms, all optimized with Bayesian Optimization. Performance was evaluated using accuracy, precision, recall, F1-score, and AUC. Our results indicate that the Gradient Boosting Classifier (GBC) performed the best at identifying a compound's efficacy towards DNA gyrase with an accuracy, precision, recall, F1-score, and AUC of 91%, 92%, 86%, 88%, and 0.933, respectively. The Random Forest Classifier (RFC) performed optimally for identifying a compound’s effectiveness towards Dihydrofolate reductase with an accuracy, precision, recall, F1-score, and AUC of 86%, 83%, 85%, 84%, and 0.944, respectively. As a result, the GBC and RFC were used to search for compounds that inhibited both DNA gyrase and Dihydrofolate reductase. Out of 18387 compounds, we identified 5 novel compounds that have a predicted probability greater than 95% to inhibit both DNA gyrase and Dihydrofolate reductase, suggesting a high antimicrobial potential. Using the GBC and RFC models, we also generated similarity maps, uncovering potential pharmacophores for DNA gyrase and Dihydrofolate reductase. The models evaluated in this study, particularly the GBC and RFC models, hold tremendous promise in computationally screening large libraries of compounds for antimicrobial potential.

Keywords: Virtual Screening; Machine Learning; Drug Discovery; Antimicrobial; Antibiotic; Lead Compounds
Top