Virtual Screening Tailored Ensembles of QSAR Models for the Discovery of Dual A 2 A Adenosine Receptor Antagonists / Monoamine Oxidase B Inhibitors

Virtual Screening methodologies have emerged as efficient alternatives for the discovery of new drug candidates. At the same time, ensemble methods are nowadays frequently used to overcome the limitations of employing a single model in ligand-based drug design. However, many applications of ensemble methods to this area do not consider important aspects related to both virtual screening and the modeling process. During the application of ensemble methods to virtual screening the proper validation of the models in virtual screening conditions is often neglected. Frequently no analysis is performed of the diversity of the ensemble members or no considerations regarding the applicability domain of the base model are made. In this research we propose a method employing genetic algorithms optimization for the generation of virtual SciForum Mol2Net, 2015, 1(Section B), pages 1-10, Proceedings 2 http://sciforum.net/conference/mol2net-1 screening tailored ensembles that address problems in the current applications of ensemble methods to virtual screening. The proposed methodology is successfully applied to the generation of ensemble models for the ligand-based virtual screening of dual target A2A adenosine receptor antagonists and MAO-B inhibitors as potential Parkinson’s disease therapeutics.


Introduction
During the last decades, Virtual Screening (VS) methodologies have emerged as efficient alternatives to the expensive, in terms of time and money, High Throughput Screening (HTS) approaches for the discovery of new drug candidates [1].In terms of efficiency, the hit rates obtained when VS tools are employed to filter large databases of chemical compounds are considerably higher than those obtained with HTS techniques [2].Literature reports where VS experiments conducted to the identification of hit molecules in a wide range of application can be found elsewhere [3,4].
VS techniques can be divided into two main categories: Structure-Based VS (SBVS) and Ligand-Based VS (LBVS) [5].The first one includes all the modeling approaches such as Molecular Docking and Molecular Dynamics that depend on the structure of a molecular receptor.In some cases the structure of the molecular target is not available or the research process focuses in a mechanism where a molecular target cannot be defined.It can also be the case that the amount of data to process is too large to complete a SBVS campaign in a reasonable amount of time.In these cases LBVS tools can aid in the drug discovery process.These types of techniques include Similarity methods, Shape-based methods, Pharmacophore modeling and Quantitative Structure-Activity Relationships (QSAR) studies [5,6].Specifically, QSAR approaches have been used for VS in the early stages of drug discovery [7].
To exploit the full potential of QSAR modeling some general guidelines have to be followed.These guidelines have been summarized as best practices for QSAR [8].In addition to these guidelines, in our previous research we pointed out the importance of the proper validation of QSAR models in VS conditions [9].
Ensemble methods (EM) have gained popularity among QSAR practitioners, providing a set of tools for combining the predictions of single models in a more robust and generalizable model [9][10][11].Although the success of these methods in Ligand-Based Drug Design (LBDD) and LBVS, the currently published applications of ensemble methods to LBVS suffer from a limitation common to most LBDD workflows: their VS performance is not retrospectively evaluated prior to their use in VS campaigns.In addition, in most reports of applications of ensemble methods to LBVS no considerations are made regarding the diversity of the base classifiers.The optimal size of the developed ensembles is neither considered and all available base classifiers are combined instead.These two last factors can drastically reduce the performance of ensemble methods [12].http://sciforum.net/conference/mol2net-1On the other side, Parkinson Disease (PD) causes chronic disability and it is the second commonest degenerative condition of the nervous system.The standard treatment for PD is levodopa, which helps to increase the dopamine levels in the brain [13].However there is a need of finding alternative therapies since levodopa has many side effects and can become ineffective over time.To this end, multicomponent therapies (combination of different drugs) have been used.However the discovery of new multi-target drugs (a single molecule that acts on multiple targets) is attracting more and more attention [14].Multitarget drugs, compared with the use of combinations of different drugs, have more predictable pharmacokinetic and pharmacodynamic relationships as a consequence of the administration of a single drug [15].
Here, are summarized the results obtained in the development of a GA-based ensemble selection method and its application to the search of ensembles suitable for the VS identification of dual target ligands for PD.It should be mentioned that the application of LBVS tools for the planned design of ligands with a predefined dual-target profile is a recent development and represents an important challenge to medicinal chemists [16]. .

Results and Discussion
The collection, curation, class assignment, representation and dataset splitting processes were performed following the procedures described in the Materials and Methods section.Both modeling sets were split into training and test subsets using three sphere exclusion algorithms 1M, 2M and 3M.This partition scheme lead to the selection of 20-23 % of the modeling data for the test set and guaranties the representativeness of each class in the test set.
Fragment descriptors were calculated with ISIDA Fragmentor software [17,18] and they were filtered to discard those with zero or close to zero variance.Afterward, the mRMR algorithm [19] was employed to only keep the 250 more relevant descriptors.ISIDA's header files listing the 250 most relevant fragments for each dataset and their partitions are provided as Supporting Information.
The main goal of this paper is to find models that can be effective in the identification of dual target ligands for PD through VS.To this end, two sets of QSAR models were independently developed for A2AAR binding and MAO-B inhibition following the procedure summarized in the Materials and Methods section.For each endpoint, three feature selection methods and three different classification algorithms were combined to generate nine different classifier types, which were trained and validated using three modeling set partitions obtained with the sphere exclusion algorithms 1M, 2M and 3M.In consequence, 27 different classification experiments per endpoint were conducted.At this point, it is important to remark that the external set was only used to evaluate the real predictive power of the models.In Table 1 are presented the statistics of the optimal models for each partition of the dataset.These results show that the employed QSAR modeling framework provides accurate, robust and generalizable models.
All 16 known dual ligands, MAO-B inhibitors and A2AAR antagonists, were compiled from the literature [20,21].For each known dual-target compound 1608 decoys were selected based on desirability-based home-developed algorithm that has been previously employed in the http://sciforum.net/conference/mol2net-1selection of tailored decoy sets for the validation of virtual screening strategies [9].These ligands and decoys were prepared for virtual screening following the same protocol described for the datasets.
To evaluate the performance of the models in virtual screening the following metrics were employed: Area Under the Accumulation Curve (AUAC), Area under the Receiver Operating Characteristic Curve (ROC), Enrichment Factor (EF) and Boltzmann-Enhanced Discrimination of ROC (BEDROC).These metrics were computed as proposed by Truchon et.al. [22] In our previous research [9] the VS-tailored ensembles were obtained using only a very limited set of predictive models that were selected as the best ones derived from each modeling approach.However, during the modeling process a larger number of high quality models are obtained that we did not considered for ensemble modeling in that investigation.
To discard low performing models, the whole pool of classification models was first filtered and a model was considered to be accurate, robust and predictive if it had values of accuracy higher than 0.75 in predicting the train, selection and external sets as well as in the cross validation experiments.The models fulfilling the above criteria are considered as valid models from here on.
For the three modeling set partitions obtained after applying the sphere exclusion algorithms, there were a total of 1526 and 1347 valid models for the A2A adenosine receptor antagonists and MAO-B inhibitors datasets respectively.Since one of the factors negatively influencing the performance of ensemble models is the redundancy of the base classifiers, these valid models were clustered following the protocol above to obtain a representative set of them.This procedure yielded 33 and 48 representative models for the A2A adenosine receptor antagonists and MAO-B inhibitors datasets respectively.The representative model of each cluster was selected as the one having the highest value of BEDROC for α=160.9.The best performance among the representative models corresponded to a value of BEDROC=0.20 for α=160.9.
We then evaluated the proposed GA-guided algorithm for the search of the ensemble maximizing BEDROC at a given fraction of screened data.The algorithm was run five times with different initial populations of 100 individuals each one, yielding 500 solutions.That is, 500 ensembles are obtained.The GA fitness function maximizes BEDROC for α=160.9.
In Table 2 are presented the enrichment metrics for the best individual found after running the GA-guided ensemble generation algorithm presented in this communication, when we search for the ensemble maximizing the initial enrichment of dual ligands.The accumulative curves for this ensemble as well as for its members are shown in Figure 1.
The analysis of these results show that the obtained ensemble is composed by 14 models, five of them related to the prediction of the antagonist activity of the A2A adenosine receptor and nine related to the inhibition of the MAO-B enzyme.This ensemble outperforms the VS metrics of all its members.Specifically, the improvement in the value of BEDROC for α=160.9 for the obtained ensemble relative to the best single model it is composed of is 0.05.The model with the highest value of BEDROC (α=160.9)among all single models within the ensemble is also the one with the best value of this metric among the set of representative valid models.This improvement might seem meaningless, however it can be interpreted as 5% http://sciforum.net/conference/mol2net-1more probability of retrieving a dual target ligand in the first 1% of the ranked list using the obtained ensemble compared to the best performing individual model [22].From Table 2 it can also be seen that the values of BEDROC for α=32.2 and α=20 are also higher when the ensemble model is compared with its members.In addition, improvements for the EF metric at the three analyzed fractions of screened data were obtained.The advantages of using the obtained ensemble for VS experiments over the use of single models can be visualized in Figure 1 where the accumulative curves of the ensemble and the models it is composed of are plotted.From this figure it is clear that the ensemble model is able to retrieve more known dual ligands and at lower positions in the ranked list than any of the single models it is composed of.
Another advantage of employing the proposed ensemble for VS tasks is that its applicability domain covers 93% of the whole virtual screening validation set, representing an improvement of 17% of coverage relative to the single model with the highest applicability domain coverage.If the applicability domain coverage of the ensemble is compared to that of the model with the highest BEDROC value for α=160.9,then this improvement increases to 33%.In contrast, the value of ROC of the ensemble is lower than the mean value of ROC across the ensemble members.This last result is a consequence of designing the GA search to find ensembles maximizing the initial enrichment of known dual ligands.In other words, the calculations here performed focus in retrieving the more dual ligands at the very first part of the ranked list and neglect the position of the ligands in the remaining of the list. .For the whole virtual screening validation datasset and b) For the first 10% of screened data.The black line corresponds to the obtained ensemble, red continuous lines to A2A models and blue discontinuous lines to the MAO-B models

Data sets
Two data sets were used in this paper; A2AAR antagonists and human MAO-B (hMAO-B) inhibitors.The A2AAR antagonist data set was retrieved from 18 different literature sources.The compounds were divided into two classes according to their Ki values.The first class, designated as potent antagonists, included all chemicals with a Ki ≤ 1000 nM (pKi ≥ 6).The second class, named as weak antagonists, was formed by compounds with Ki >1000 nM (pKi < 6).As result of this categorization a balanced dataset was obtained that included 161 potent antagonists and 166 weak antagonists.
The hMAO-B inhibitors data set was compiled from [23] and it contains 474 compounds.the compounds were classified in two groups according with theirs IC50 values.The first group, named hMAO-B inhibitors, included those chemicals with IC50 ≤ 20 µM (pIC50 ≥ 4.70), while the second one, designated as hMAO-B non-inhibitors comprises those compounds with IC50 > 20 µM (pIC50 < 4.70).Thus, the 474 ligands were split in 313 inhibitors and 161 non-inhibitors of hMAO-B.
The chemical structures were represented in smiles format and then converted to a SD file (SDF) using the ChemAxon's JChem for Excel (6.3.1.1807)program [24].Each data set, as well as the decoy compound candidates were curated following the guidelines proposed in the literature [25] using ChemAxon's Standardizer [26] Datasets splitting The external sets were randomly selected as 20% of the entire initial datasets.The modeling sets were subsequently partitioned into training and test sets using the three sphere exclusion (SE) algorithms proposed by Golbraikh [27] and implemented in our laboratory that ensure the closeness in chemical spaces of the train and test sets.
The three variants 1M, 2M, and 3M of the sphere exclusion algorithm proposed by the http://sciforum.net/conference/mol2net-1developers and used here to divide the balanced modeling sets were implemented in MATLAB [28].Unlike the original algorithms, for the SE based partitioning of the data the structure of the compounds was encoded as 1024 bits Chemaxon's Topological Fingerprints from GenerateMD program [26]; and the Tanimoto distance was selected as the distance metric.The radius of the spheres was varied between 0.05 and 1.0 with a step of 0.05
Descriptors were calculated for the training dataset.Afterward, fragments with the same value for 99% of the samples were removed.The minimal Redundancy Maximal Relevance (mRMR) algorithm [19] was applied to the reduced data set to keep only the top 500 fragments according to the MIQ score.The same subset of 500 fragments was then computed for the selection and external sets.
Classification-based modeling methodology QSAR modeling for each dataset was performed using the previously proposed QSAR modeling framework.Here the main steps involved in the modeling process are summarized and the detailed description of the QSAR modeling framework is available in [9] Methodology of models ensemble for virtual screening For both targets QSAR models were first filtered, regardless the dataset partition or modeling approach they were obtained with, to ensure that only accurate, robust and predictive models remain eligible as ensemble members.For this step we set the same cutoff value of 0.75 for the accuracy in predicting the training, selection and external datasets as well as in the cross-validation experiments.The models fulfilling these conditions were then clustered using the Hamming distance between the predictions they made on the external dataset to select a representative set of models to ensure diversity in the pool of base models.Clustering was carried out using the K-means algorithm implemented in MATLAB [28].To obtain the optimal number of clusters we examined the silhouette plot [29] when the number of clusters varied between 3 and 50 and selected the number of clusters corresponding to the maximum of this plot.This procedure leads to the selection of NA2A and NMAOB representative models which form the final pool of diverse models candidates for ensembles.
To build the ensembles for VS we followed the same protocol described in our previous publication [9].In brief, for a given subset of QSAR models forming the ensemble and a dataset to be evaluated we first search for the samples included within each model applicability domain.Next, the scores produced by each model for the compounds inside its applicability domain are used to obtain a ranking for them and the relative ranking of each sample in each model is computed.Once the relative rank of every sample in each model considering the model's applicability domain is determined, these relative rank values are averaged over the models the compound is inside their applicability domains to obtain the final aggregated score.Finally, the compounds are sorted according to this aggregated score in ascending order to obtain the final ensemble ranking.
Given that to evaluate the virtual screening performance of all ensembles formed by all possible combination of size two to NA2A+NMAOB of the selected models is computationally unfeasible, we implemented a novel GA search strategy which could find combinations of http://sciforum.net/conference/mol2net-1models maximizing the EF at a given fraction of screened data.Each individual for the GA search represents an ensemble and they are encoded as binary vectors of length NA2A+NMAOB.In an individual the "on" bits encode the set of models considered for the ensemble while "off" bits represent models excluded from the ensemble.The initial population was set to 100 randomly generated individuals and the population evolved for 100 generations.The crossover and mutation rates were set to 0.7 and 0.3 respectively while the best two individuals survived to the next generation.The selection operator was set to a tournament of size 2. For the crossover operator, the offspring chromosomes were randomly selected position by position from the two selected parents.The mutation operator changed a randomly selected "on" bit to "off" and one randomly selected "off" bit to "on" in the individual.In each case study the GA was run five times using different initial populations.The objective function for the GA was selected as BEDROC for α=160. ..

Conclusions
We designed a methodology for the generation of virtual screening tailored ensembles capable of overcoming the previously identified problems.This methodology considered the diversity of the base models for ensemble generation and their applicability domains.The main advantage of the proposed algorithm is that it is able of finding the combination of models providing the best VS performance for a specific problem.
The proposed algorithm was applied to the VS simulation of dual target A2A adenosine receptor antagonists and MAO-B inhibitors.The obtained results showed that the obtained ensemble outperformed the best individual model according to the evaluated enrichment metrics.Thus, confirming the expected improved performance of ensemble models over single ones.In the specific problem being addressed, the results of the ensemble modeling process highlighted the importance of considering information from both targets for the discovery of dual target ligands.

Table 1 .
Statistics for the best performing models for each dataset partition.Modeling method the classifier is based on, GA stands for Genetic Algorithm, BT for Bagged Trees, AB for Adaboost, LSSVM for Least Squares Support Vector Machines and LDA for Linear Discriminant Analysis.Accuracy of the Leave One Out, Bootstraping and 5-fold cross-validation experiments respectively a b Number of features in the model.c,d,h Accuracy in predicting the training, test and external sets respectively e,f,g

Table 2 .
Enrichment metrics for the ensemble maximizing the initial enrichment of known dual ligands.
http://sciforum.net/conference/mol2net-1 Figure 1.Accumulation curves for the best ensemble as well as for its base classifiers: a)