In silico Identification of Potential Sesquiterpene Lactones from SistematX Database against Schistosoma mansoni

Abstract. Schistosomiasis is an acute and chronic parasitic disease, caused by blood flukes (trematode worms) of the genus Schistosoma . For 2019, the World Health Organization estimated that close to 240 million people required preventive treatment against this disease, mainly poor communities without access to safe drinking water and adequate sanitation. Similarly, to others Neglected Tropical Diseases (NTDs) the treatments against this disease are limited


Introduction
Schistosomiasis is an acute and chronic parasitic disease, caused by blood flukes (trematode worms) of the genus Schistosoma. For 2019, the World Health Organization (WHO) estimated that close to 240 million people required preventive treatment against this disease, mainly poor communities without access to safe drinking water and adequate sanitation [1]. One of the most interesting targets of Schistosoma for the development of new treatments is the dihydroorotate dehydrogenase (DHODH), a flavoenzyme that catalyzes the stereospecific oxidation of (S)-dihydroorotate (DHO) to orotate during the fourth and only redox step of the de novo pyrimidine nucleotide biosynthetic pathway [2]. Atovaquone (an antimalarial treatment) was identified by de Mori et al, as a selective inhibitor against Schistosoma mansoni DHODH [3]. Sesquiterpene lactones are chemotaxonomic markers of Asteraceae which have been successfully used in humans against parasite diseases, such artemisinin, an antimalarial sesquiterpene lactone which https://mol2net-07.sciforum.net/ discovery led to the 2015 Nobel Prize for Medicine and Physiology [4]. In the case of schistosomiasis, where there have been no in silico studies with SLs [4]. In this study, a machine learning model was built, and a ligand-based virtual screening was performed, using 1,300 SLs registered in SistematX database. The best-ranked molecules were used to explore the mechanism of action of these SLs against Schistosoma mansoni, through molecular docking calculations which were performed in the crystal structure of SmDHODH in complex with an atovaquone analogue inhibitor.

Materials and Methods
From the ChEMBL database (https: //www.ebi.ac.uk/chembl/), we selected a diverse set of 309 structures that were initially classified according to their predicted activity against Schistosoma mansoni. These compounds were classified according to pIC50 values [− logIC50 (mol/L)]; therefore, we stratified them into active (pIC50 ≥ 6.0) and inactive (pIC50 < 6.0) structures. Data curation of the datasets was performed, according to the suggested procedures in the literature. The 3D structures of the identified molecules, in special data file (SDF) format, were used as input data in Volsurf+, v.1.0.7 resulting in a total of 128 molecular descriptors [5,6]. The sesquiterpene lactones dataset was obtained from the SistematX database (https ://sistematx.ufpb.br), and a total of 1300 molecules were used in this study. For all structures, SMILES codes were used as the input data in Marvin [ChemAxon, version 20. 8 [7]. Initially, the descriptors calculated in the Volsurf+ program were imported, in comma-separated value (CSV) format, and the "Partitioning" node in the stratified sampling option was used to classify 80% of the initial dataset as the training set and the remaining 20% as the test set. The model was generated by employing the modeling set and the RF algorithm, with a "fivefold external validation" procedure, using WEKA nodes. In the fivefold cross-validation procedure, the dataset is divided five times into a modeling set (80-20%). After this modeling set (which was used to build and validate models) is divided additionally into multiple training (80%) and test sets (20%) [8]. In addition, the structure of the Schistosoma mansoni dihydroorotate dehydrogenase, DHODH, (PDB ID: 6UY4) in complex with its respective inhibitor: 2-[(4-fluorophenyl)amino]-3-hydroxynaphthalene-1,4-dione (PDB ID: QLA) was downloaded from PDB [2]. Using Molegro 6.0.1, molecular docking calculations were performed using a grid, with a 15-Å radius and a 0.30-Å resolution, to cover the ligandbinding site for the DHODH structure.

Results and Discussion
For the training set used in the Random Forest (RF) model, the match percentage values approached 100%. For the cross-validation (CV) and test sets, values above 78.1% and 77.6% were obtained, https://mol2net-07.sciforum.net/ respectively. Sensitivity, which is defined as the true-positive rate, (78.3%) was greater than the specificity rate (77.9%), which is defined as the true-negative rate, in the five-fold CV. The area under ROC curve (AUC), a quality parameter that plots the sensitivity against (1-specificity) was calculated for both CV and test sets [9]. AUC values of 0.868 and 0.877 were achieved for five-fold CV and external test sets, respectively, demonstrating a high rate of sensitivity and a low false-positive rate. Additionally, Matthews's correlation coefficient (MCC),which is determined from all of the values obtained from the confusion matrix, also was calculated [10]. MCC values of 0.562 and 0.552 for fivefold CV and external test sets, respectively were obtained, demonstrates a high degree of differentiation between the active and inactive compounds identified in the ChEMBL dataset of S. mansoni. Finally, the applicability domain (APD) was used to assess the reliability of the predictions for the samples in the external test and SLs sets, and the calculation of the APD is based on the molecular interactions determined by the VolSurf+ descriptors [5,6]. All structures for the external test set and 97.3% for the SistematX SLs set were classified as reliable. After, using this RF algorithm, a ligand-based virtual screening was performed on a set including 1,300 molecules obtained from SistematX. For S. mansoni 557 SLs (42.8%) showed active probability values above of 0.5. Structurally, some common features were found among the best-ranked structures. Structure 1 and 3 contain a germacranolide skeleton, bound to stearic fatty acid ester. Additionally, the group α-methylene-γ-lactone was observed in four of the five best-ranked structures (structures 1, 3-5), the presence of this moiety has been associated with interactions between this type of metabolite and the sulfhydryl group of cysteine, through a Michael addition (Figure 1) [11].
The structure 1 (9α-stearoyloxy-8β-(2-methylbutyryloxy)-15-hydroxy-14-oxo-acanthospermolide), is a secondary metabolite of Acanthospermun hispidium, a plant native to Central and South America ( Table  2) [12]. Some metabolites structurally related with structure 1, 9α-linoloyloxy-8β-(2-methylbutyryloxy)-15-hydroxy-14-oxo-acanthospermolide and 9a-linolenoyloxy-15-hydroxy-8b-2-methylbutyryloxy-14oxo-acanthospermolide have been identified as potential antileishmanial and antichagasic SLs, two of the main NTDs that affect the American continent [8,13].  Molecular docking calculations were performed to explore the mechanism of action of the five bestranked SLs in the active site of Schistosoma mansoni dihydroorotate dehydrogenase (SmDHODH), a flavoenzyme that catalyzes the stereospecific oxidation of (S)-dihydroorotate (DHO) to orotate during the fourth and only redox step of the de novo pyrimidine nucleotide biosynthetic pathway [2].  Table 3). The docking results are not influenced by the molecular weight of the structures, being the structure 2, one of the two highest structures in the tested SLs, but the docking score is higher respect QLA. https://mol2net-07.sciforum.net/ The residues S53 appears as critical aminoacid for binding to SmDHODH, being observed in the Structure 3,4 and 5. Interestingly, for the structure 4, a hydrogen bond interaction with R130 is observed. This interaction is also observed in the inhibitor QLA ( Figure 2) . The Structure 4 showed a lower binding docking score (-141,38 kJ/mol) and exhibited a similar pattern of interaction in the active site of SmDHODH respect the inhibitor reported in the PDB, being one of the most promissory SL against S. mansoni. H-bond (lime), van der Waals (green), π-π (purple) and π-alkyl (pink), unfavorable (red) and carbon-H-bond (teal). Red dotted circles indicate the critical H-bond interactions.

Conclusions
Using computational techniques, a preliminary evaluation of a set of 1300 SLs looking for potential agents against S. mansoni. The ligand-based virtual screening allows to select the most promissory molecules based in the calculated probability in the machine learning RF model. The five selected molecules showed structural features related with the biological activity observed in the SLs, i.e., the presences of the α-methylene-γ-lactone moiety. After, the molecular docking calculations allowed the https://mol2net-07.sciforum.net/ identification of three structures with lower docking values respect QLA, being the structure 4 the most promissory inhibitor of the SmDHODH, one of the most important targets in the Schistosoma life cycle. This study is only explanatory but might be considered as a starting point for the development of new treatments against this NTD.