&ldquo;Prediction Reliability Indicator&rdquo;: A new tool to judge the quality of predictions from QSAR models for new query compounds

Kunal Roy; Pravin Ambure; Supratik Kar

doi:10.3390/mol2net-04-05265

Previous Article in event

A novel QSAR model to predict epidermial growth factor inhibitors

Previous Article in congress

Structure Based docking studies for the identification of small molecule targeting mTOR for the treatment of Breast cancer

Next Article in event

Chemometric modeling of toxicity of contaminants of emerging concern to Dugesia japonica and its interspecies correlation with daphnia and fish: QSTR and i-QSTTR approaches

“Prediction Reliability Indicator”: A new tool to judge the quality of predictions from QSAR models for new query compounds

Kunal Roy

^{*

1},

Pravin Ambure

¹,

Supratik Kar

¹ Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
² Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS-39217, USA

Published: 24 May 2018 by MDPI in MOL2NET'18, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 4th ed. congress CHEMBIOINFO-04: Chem-Bioinformatics Congress Cambridge, UK-Chapel Hill and Duluth, USA, 2018

https://doi.org/10.3390/mol2net-04-05265

Abstract:

Prediction of an endpoint for new query chemical without having any experimental response data is one of the important applications of Quantitative structure-activity relationship (QSAR) models. Usually a QSAR model is developed based on chemical information of a properly designed training set and corresponding experimental response data while the model is validated using one or more test set(s) for which the experimental response data are available. However, it is interesting to estimate the reliability of predictions when the model is applied to a completely new data set (true external set) even when the new data points are within applicability domain (AD) of the developed model. In the present study, we have developed a tool “Prediction Reliability Indicator” to indicate or categorize the quality of predictions for the test set or true external set into three groups: good (with composite score 3), moderate (with composite score 2) and bad (with composite score 1). Here, we have used three criteria [1) Mean absolute error of leave-one-out predictions for 10 most close training compounds for each query molecule (J Chemom 2018, http://dx.doi.org/10.1002/cem.2992 ); 2) Applicability domain in terms of similarity based on the standardization approach (Chemom Intell Lab Sys, 145, 2015, 22-29, http://dx.doi.org/10.1016/j.chemolab.2015.04.013); 3) Proximity of the predicted value of the query compound to the experimental mean training response (Chemom Intell Lab Sys, 162, 2017, 44-54, https://authors.elsevier.com/a/1UOpFcc6LvBdv )] in different weightage schemes for making a composite score of predictions. The tool can automatically find the optimum weightage based on % correct prediction score computed using a test set with known observed response and thus known quality of predictions. However, the user also has an option to select the weightage manually. It was found that using the most frequently appearing weightage scheme 0.5:0:0.5, the composite score based categorization showed concordance with absolute prediction error based categorization for more than 80% test data points while working with 5 different data sets with 15 models for each set derived in three different splitting techniques. These observations were also confirmed with two external sets suggesting applicability of the scheme to judge the reliability of predictions for new data sets. The tool is available free of charge at http://dtclab.webs.com/software-tools .

Keywords: QSAR; Validation; Reliability; Precision; External set

View Poster

340 Reads

Comments on this paper

Kunal Roy

Pravin Ambure

Supratik Kar