Obtaining QSPR models for the prediction of physicochemical properties of topical antimicrobials

Graphical Abstract Abstract The traditional form of development and investigation off the antimicrobial has been resulting inefficient according to the delay of the new candidates discovery in the last years. Several limitations have been demonstrated, such as the long time invested, the expensive experimental trials or the errors in the manipulation of the researcher. To solve this problem, the application of computational methods in the design of drugs raised as a promised alternative. Specifically, the QSPR studies are oriented to determine the functions that capable to predict a particular property of a compound, using the information contained in their molecular descriptors. This strategy allowed analyzing a great quantity of molecules in a minor time and with less resources. Five specific models were defined in the present work in order to predict the interested physicochemical properties (aqueous solubility (S), partition coefficient (P), distribution constant (D), acid dissociation constant ( 𝐾𝐾 𝑎𝑎 ) and superficial tension (σ)) for the external use only of a series of 400 antimicrobial compounds, with simplified representations, physicochemical properties and


INTRODUCTION
Throughout history, antimicrobials have become an essential tool for humans in the battle against diseases caused by microorganisms because of numerous benefits such as the simplicity of production and stograge condition, the convenience and quickness of their use and a broad spectrum of action with minimal toxicity.However, multiple factors such as the execssive and incorrect use of antimicrobials or the lack of a complete education on drugs rational use, all this factors have caused serious consequences and among them, the phenomenon called antibiotic resistance, recognized widely as one of the main problems related to the use of medicines. 1,2aditional research methods have demosntrated their inefficacy by not being able to give an adequate response to this situation, thus it is pertinent to establish a new series of methods which allow to boost the research and development rate in this sector.In this sense, computational methods emerge as a promising tool for molecular design of new candidates of pharmaceutical interest, allowing to analyze a large number of previously selected compounds in a reduced time with minimized resources access.[5] In the particular case of antimicrobials for topical use, the following physicochemical mesurements compose strong influence in their pharmacological performance, which are solubility (S), partition coefficient (P), distribution constant (D), acid dissociation constant (K a ) and the superficial tension (σ).A profound knowledge of the relation between these properties and chemical structure of interest is essential not only to successfully develop a new pharmaceutical candidate, but also to enhance the behavior of already existed molecules. 6,7sed on the potential advantages, this work aims to obtain general QSPR models for the prediction of five mentioned physicochemical properties of antimicrobials for topical use.

Construcción of training series.
The training series used 400 compounds organized into ten families according to their specific action, with recognized antimicrobial activity.In each group, 40 representative compunos from each family were included. 1,8e ACD-Labs software was used to obtain the simplified molecular structures, in form of SMILES code, of each compund of the training series and the experimental values of the evaluated physicochemical properties.
From the derived SMILES CODES, using the TOPS-MODE approach of the MODESLAB software, a set of molecular descriptors (DM) that weight the structural properties related to the modeled physicochemical properties was calculated: bond distance (Std), dipole moment (Dip), hydrophobicity (Hyd), polarizability (Pol), Van der Waals radius (Van) and atomic weight (Ato).As a result, a matrix was formed with the spectral moments from μ0 to μ15 by each graph. 9,10onstruction of QSPR predictive model.The Systematic Search method of the BuildQSAR Software was engaged to select, from the 91 molecular descriptors calculated for each compound, those with the greatest capacity to structure as independent variables of an efficient QSPR model.The Multiple Linear Regression analysis (MLR) offered by the BuildQSAR software was used to optimize the initial QSPR models.[13] Validation off QSPR predictive model.For the internal and exxternal validation of the obtained model, the LOO (Leave-one-out).In the internal validation, to evaluate the robustness and predictive power of a QSPR model, following conditions of statistical excellence need to be satisfied: coefficient of cross-correlation  2 ( 2 > 0.5) and the residual predictive summary of the standard squared deviation   (  < 0.3).The external evaluation was carried out with a test that included 40 new compounds with antimicrobial activity similar to that compounds in the training series.[16]

RESULTS AND DISCUSSION
The predictive capacity off a QSPR model depends immensely on the characteristics of the compounds of the training series.400 antimicrobial compounds included in the training series represent ten pharmacological groups which corresponde to the polyfunctionality that distinguishes the molecules of interest.Table I shows the detailed classification.

Antineoplastic Antiseptic
The calculations was caried out using the MODESLAB software generated 91 molecular descriptors as independent variables for each compound in the training series, with the corresponding parameters related to the estimated properties: bond distance (Std), dipole moment (Dip), hydrophobicity (Hyd), polarizability (Pol), Van der Waals radius (Van) and atomic weight (Ato).
The inclusion of a large number of independent variables in a QSPR function may hamper its explanatory power.
For this reason, it is recommended to use an adequadate number of descriptors, with high statistical quality and relatively easy to interpret.The BuildQSAR software performed the selection of variables using the Systematic Search method, to sort out the three best molecular descriptors to include as independent variables for each model, according to the four considered esential criteria for the evaluation of the candidate: (i) multiple correlation R (R > 0.6); (ii) standard error of the estimate s (s < 1); (iii) coefficient F of the test ANOVA (F >> 1 with p < 0.05).
To ensure the normality of the distrution of variables, provide stability to the regressores and reduce the atypical observations, logarithmic transformation of solubility variable (Log Sol) and inversion of acid dissociation constant variable (p  −1 ) were triggered.
Table II shows the statistical parameters.In order to build the predction models for the physicochemical properties of interest, the Multiple Linear Regression analysis (MLR) offered by the BuildQSAR software was proceeded to perform the predictive mathematical functions from the obtained independent variables.Five QSPR models corresponding to the five properties of interest: solubility, partition constant, distribution constant, acid dissociation constant and the superficial tension.
As an example, the construction of the QSPR model of partition coefficient (Log P) was highlighted by a brief qualitative analysis of its optimization.The elimination of the outliers leads to an considerable increase of the value of R and F (0.95 and 912.81 respectively), whereas the standard error s go down sharply (0.695), attaining the very good fit of this model to the experimental data.On the other hand, due to the simplicity of model M2 whose represented by only three independent variables, its interpretation and applicability were facilitated as well.

Analysis of predictive
The MLR analysis includes the t-test of the significance of the slopes and the results demonstrate the significant constribution of the selected variables to the variation of the partition coefficient.The value of R, R 2 and s decrease slightly in respect to the previous model, but the value of F is increased considerably (1335.01),which implies that the this simplified model with only two molecular descriptors as independent variables, reaches an comparable adjustment to the experimental data with the model M2, which further facilitates its interpretation and applicability.The model M3 fulfills the compliance of the principle of orthogonality or independence between µ(Hyd)1 and µ(Van)14, as has been asumed in the MLR analysis.

Table IV. Partial correlation coefficients between µ(Hyd)1 and µ(Van
Finally, Figure 1 indicates a good linear correlation between the experimental and calculated values of Log P, thus ensuring the reliability of the model M3.

Figure 1. Linear correlation between the observed and calculated values of Log P from the model M3. Source: BuildQSAR/MLR
For these reasons, the model M3 is optimal for validation of its prediction of the dependency between the proposed property (partition coefficient) and the structure of the antimicrobials.
Following the same methodology, the predictive QSPR models of the remaining physicochemical properties of interest were obtained with summarized information in the table V.The models of Log Sol, Log P and Log D show better statistical quality with the error estandar of the estimate is less than 1, since all the parameters meet the established criteria, than the models of pK a −1 and σ.But the rest of the statistical parameters indicate an acceptable fit to the experimental data for this type of model.According to the information in table V, all the five QSPR models can be subjected to the validation procedure for their further application in the research and development of new topical antimicrobials.

Internal validation of the QSPR models
Table VI shows the statistical results obtanied from the internal validation of the obtained QSPR models: As can be seen, both the parameters Q 2 and (R 2 -Q 2 ) of five models meet the validation criteria demonstrating an appropiate level of stability when internal compounds are excluded for the construction of predictive models.However the value of   and   of the models of M6 (pK a −1 ) and M7 (σ), although they are similar to those of the obtained models, is higher than the logarithmic unit and thus not satified the established criteria for the standard error of the estimate.Therefore, this measure of internal consistency is not enough to suggest the use of these functions as useful prediction tools.

External validation of the QSPR models
For the external validation of the predictive capacity, an external series was built from 40 compounds obtained from the library of the same ACD-Labs software.but presents a higher value of s pred than the experimental s mode .On the other hand, the models M6 and M7 do not satisfy any of the statistical criteria of external validation.For this reason these three models M4, M6, M7 are not suitable for the prediction of corresponding physicochemical properties.Nevertheless, these results could be used as references for other studies of this area of research by using other types of molecular descriptor or criteria for selection and optimization of variables.
The immense structural diversity of the training series, where the majority are ionizable compounds, makes it difficult to estimate the aqueous solubility by a general predictive QSPR model.Beside of the close proportionality between the aqueous solubility and the acid dissociation constant, the influence of the pH-dependence of these properties is reflected in the different contribution of the ionizable degree of each compound, which prove the poor predictive capacity of models M4 and M6.Abnormality of the distribution levels of polarity also posiblely could be a cause which leads to a statitical dissatisfaction of predictive power of model M7.All this explanations suggest to perform for each anitimicrobial family, a specific predictive QSPR model of such high-variable properties.
Notwithstanding the poor statistical results of M4, M6 and M7, the models M3 (Log P) and M5 (Log D) are those that meet all the criteria of statistical excellence corresponding to the partition coeffcient and distribution constant.

CONCLUSIONES
In summary, it is possible to guarantee the use of the optimal QSPR models of the partition coeficient (M3) and the distribution constant (M5) as a reliable predictive tool for this important properties in the development of new antimicrobial candidate for topical use.

Table VII . External validation statistical parameters obtained by using LOO method. Source: BuildQSAR/LOO
Table VII shows the stastitical results of the external validation.