Please login first
A Robust SMILES-Based Prediction of Aqueous Solubility of Diverse Antiplasmodial Compounds using Machine Learning Algorithms
* 1 , * 2
1  Department of Science Laboratory Technology, Faculty of Physical Sciences, University of Nigeria
2  Department of Pharmaceutical and Medicinal Chemistry; Faculty of Pharmaceutical Sciences; University of Nigeria, Nsukka, Nigeria
Academic Editor: Maria Emília Sousa (registering DOI)

Apart from the pharmacodynamics of drugs and the resistance of the Plasmodium falciparum parasite to existing antimalarial drugs, pharmacokinetic-related properties of drugs also hamper their translation. The need to develop novel drugs with optimum solubility profiles necessitated the training of an efficient machine learning regression model for the prediction of the solubility of a series of compounds. Four descriptors: octanol-water partition coefficient, molecular weight, number of rotatable bonds and aromatic proportion from the simplified molecular-input line-entry system (SMILES) of 11,478 antiplasmodial molecules were used. This was trained using five regression models; multiple linear regression, k-nearest neighbors, LASSO regression, support vector regressor and random forest regressor (RFR)) to predict the solubility of molecules. The evaluation metrics (R2, mean squared error (MSE), mean absolute error (MAE) and root mean squared error (RMSE)) were used to assess the model performance. Of the performed algorithms, the RFR produced a robust model with model statistics of MSE 0.54, R2 0.85, MAE 0.41 and RMSE 0.73. The F-statistic for the model was 7214, showing a strong correlation between the descriptors and solubility of molecules. This could efficiently predict the antimalarial activity for untested molecules to select promising ligands as leads for further optimization.

Keywords: Antimalarial; machine learning; molecule descriptors; regression models; solubility