Mol 2 Net Multi-Viral Targets Entropy QSAR for Antiviral Drugs

The antiviral QSAR models today have an important limitation. Only they predict the biological activity of drugs against only one viral species. This is determined due the fact that most of the current reported molecular descriptors encode only information about the molecular structure. As a result, predicting the probability with which a drug is active against different viral species with a single unifying model is a goal of major importance. In this we use the Markov Chain theory to calculate new multi-target entropy to fit a QSAR model that predict by the first time a ms-QSAR model for 900 drugs tested in the literature against 40 viral species and other 207 drugs no tested in the literature using entropy QSAR. We used Linear Discriminant Analysis (LDA) to classify drugs into two classes as active or non-active against the different tested viral species whose data we processed. The model correctly classifies 31 188 out of 31 213 non-active compounds (99.92%) and 432 out of 434 active compounds (99.54%). Overall training predictability was 98.56%. Validation of the model was carried out by means of external predicting series, the model classifying, thus, 15 588 out of 15 606 non-active compounds and 213 out of 217 active compounds. Overall validation predictability was 98.54%. The present work report the first attempts to calculate within a unify framework probabilities of antiviral drugs against different virus species based on entropy analysis.


Introduction
Examples of diseases caused by viruses include the common cold (produced by any one of a variety of related viruses), AIDS (caused by HIV) and cold sores (caused by herpes simplex); which produced some of the major health problems in the last 30 years.Other relationships are being studied such as the connection of Human Herpesvirus 6 (HHV6), one of the eight SciForum Mol2Net, 2015, 1(Section A, B, C, etc.), 1-x, Proceeding, doi: xxx-xxxx 2 known members of the human herpes virus family, with organic neurological diseases such as multiple sclerosis and chronic fatigue syndrome.Recently, it has been shown that cervical cancer is caused, at least partially, by papillomavirus, representing the first significant evidence in humans for a link between cancer and an infective agent.The relative ability of viruses to cause disease is described in terms of virulence.
Consequently, there is an increasing interest on the development of rational approaches for discovery of antifungal drugs.In this sense, a very important role may be played by computer-added drug discovery techniques based on Quantitative-Structure-Activity-Relationship (QSAR) models (1).Unfortunately, almost QSAR studies, including those for antiviral activity and others, use limited databases of structurally parent compounds acting against one single fungus species (2).One important step in the evolution of this field was the introduction of QSAR models for heterogeneous series of antimicrobial compounds; see for instance the works of Cronin, de Juliá n-Ortiz, Galvé z, Gá rcí a-Domenech, Gosalbez, Marrero-Ponce, Torrens, et al. and others (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15).As a result, researchers may predict very heterogeneous series of compounds but often need to use/develop as many QSAR equations as microbial species are necessary to be predicted.In any case, if you aim to predict activity against different targets you still need to use one different QSAR model for each target.
An interesting alternative, is the prediction of structurally diverse series of antimicrobial compounds (antiviral in this case) against different targets (mechanisms) using complicated non-linear Artificial Neural Networks with multiclass prediction, e.g. the work of Vilar et al. (16).We can understand strategies developed in this sense as Multi-Objective Optimization (MOOP) techniques; in this case we pretend to optimize the activity of antiviral drugs against many different objectives or targets (viral species).A very useful strategy related to the MOOP problem use Derringer's desirability function desirability function and many QSAR models for different objectives (17).In this sense, it is of major importance the development of unified but simple linear equations explaining the antimicrobial activity, in the present work antiviral activity, of structurally-heterogeneous series of compounds active against as many targets (viral species) as possible.We call this class of QSAR problem the multi-target QSAR (mt-QSAR) (18,19).
There are near to 2000 chemical molecular descriptors that may be in principle generalized and used to solve the mt-QSAR problem.Many of these indices are known as Topological Indices (TIs) or simply invariants of a molecular graph G.We can rationalize G as a draw composed of vertices (atoms) weighted with physicochemical properties (mass, polarity, electro negativity, or charge) and edges (chemical bonds) (20).In any case, many of these indices have not been extended yet to encode additional information to chemical structure.One alternative to mt-QSAR is the substitution of classic atomic weights by target specific weights.For instance, we introduced and/or reviewed TIs that use atomic weights for the propensity of the atom to interact with different microbial targets (21) or undergoes partition in a biphasic systems or distribution to biological tissues (22)(23)(24).The method, called MARCH-INSIDE approach, Markovian Chemicals In Silico Design, calculates TIs using Markov Chain theory.In fact, MARCH-INSIDE define a Markov matrix to derive matrix invariants such as stochastic spectral moments, mean values, absolute probabilities, or entropy measures, for the study of molecular properties.Applications to macromolecules have extended to RNA, proteins, and blood proteome (25)(26)(27)(28)(29)(30).In particular, one of the classes of MARCH-INSIDE descriptors is defined in terms of entropy measures; which have demonstrated flexibility in many bioorganic and medicinal chemistry problems such as: estimation of anticoccidial activity, modelling the interaction between drugs and HIV-packaging-region RNA, and predicting proteins and virus activity (24,(31)(32)(33).We give high importance to entropy measures due to it have been largely demonstrate as an excellent function to codify information in molecular systems, see for instance the important works of Graham (34)(35)(36)(37)(38)(39).However, have not been studied the proficiency of entropy indices (of MARCH-INSIDE type or not) to solve the mt-QSAR problems in antiviral compounds.
The present study develops the first mt-QSAR model based on entropy indices to predict antiviral activity of drugs against different viral species.The model fits one of the largest datasets used up-to-date in QSAR studies, number of entries 47 000+ cases; which is the result of forming different (antiviral compounds/viral target) pairs.

Results and Discussion
One of the main advantages of the present stochastic approach is the possibility of deriving average thermodynamic parameters depending on the probability of the states of the MM.The generalized parameters fit on more clearly physicochemical sense with respect to our previous ones (24,41,42).In specific, this work introduces by the first time a linear mt-QSAR equation model useful for prediction and MOOP of the antiviral activity of drugs against different viral target species or objectives.The best model found was: In the model the coefficient λ is the Wilk's statistics, statistic for the overall discrimination, χ 2 is the Chi-square, and p the error level.In this equation, k θs where calculated for the totality (T) of the atoms in the molecule or for specific collections of atoms.These collections are atoms with a common characteristic as for instance are: heteroatom (Het), unsaturated Carbon atoms (Cunst), saturated Carbon atoms (Csat) and hydrogen bound to heteroatom (H-Het.The more interesting fact is that k θs have the skill of discerning the active/no-active classification of compounds among a large number of viral species.This property is related to the definition of the k θs using species-specific atomic weights (see supplementary material file for method).It allows us to model by the first time a very heterogeneous a diverse data with more than 47 470 cases (one of the largest in QSAR).Another interesting characteristic of the model is that the k θs used as molecular descriptors depend both on the molecular structure of the drug and the viral species against which the drug must act.The codification of the molecular structure is basically due to the use of the adjacent factor αij to encode atom-atom bonding, molecular connectivity.The other aspect that allows encoding molecular structural changes is that the entropy k θs are atomclass specific.This property is related to the definition of the k θs.For example, one change in the molecular structure of, e.g. S by O, necessarily implies a change in the moments of interaction.Moreover, the most interesting fact is that k µs are the molecular descriptors reported for antimicrobial mt-QSAR studies able to distinguish among a large number of viral species.The present work is the first reported mt-QSAR model using entropy k θs as a molecular descriptor that allow one predicting antiviral activity of any organic compound against a very large diversity of viral pathogens.

Markov entropy (θk) for drug-target k-th step-by-step interaction
One can consider a hypothetical situation in which a drug molecule is free in the space at an arbitrary initial time (t0).It is then interesting to develop a simple stochastic model for a step-bystep interaction between the atoms of a drug molecule and a molecular receptor in the time of desencadenation of the pharmacological effect.For the sake of simplicity, we are going to consider from now on a general structure less receptor.Understanding as structure-less molecular receptor a model of receptor which chemical structure and position it is not taken into consideration.Specifically, the molecular descriptors used in the present work are called stochastic entropies θk, which are entropies describing th connectivity and the distribution of electrons for each atom in the molecule (40).The initial entropy of interaction a j-th atom of the drug with the target 0 θj(s) is considered as a state function so a reversible process of interaction may be came apart on several elemental interactions between the j-th atom and the receptor.The 0 indicates that we refer to the initial interaction, and the argument (s) indicates that this energy depends on the specific viral species.Afterwards, interaction continues and we have to define the interaction probability k θij(s) between the j-th atom and the receptor for specific viral specie (s) given that i-th atom has been interacted at previous time tk.In particular, immediately after of the first interaction (t0 = 0) takes place an interaction 1 pij(s) at time t1 = 1 and so on.So, one can suppose that, atoms begin its interaction whit the structure-less molecular receptor binding to this receptor in discrete intervals of time tk.However, there several alternative ways in which such step-by-step binding process may occur (24,41,42).
The entropy 0 θj(s) will be considered here as a function of the absolute temperature of the system and the equilibrium local constant of interaction between the j-th atom and the receptor 0 γj(s) for a give microbial species.Additionally, the energy 1 θij(s) can be defined by analogy as γij(s) (24,41,43): The present approach to antimicrobial-speciesspecific-drug-receptor interaction has two main drawbacks.The first is the difficulty on the definition of the constants.In this work, we solve the first question estimating 0 γj(s) as the rate of occurrence nj(s) of the j-th atom on active molecules against a given specie with respect to the number of atoms of the j-th class in the molecules tested against the same specie nt(s).With respect to 1 γij(s) we must taking into consideration that once the j-th atom have interacted the preferred candidates for the next interaction are such i-th atoms bound to j by a chemical bond.Both constants can be then written down as (24,41,43): Where, αij are the elements of the atom adjacency matrix, nj(s), nt(s), 0 θj(s), and 1 θij(s) have been defined in the paragraph above, r is the universal gases constant, and t the absolute temperature.The number 1 is added to avoid scale and logarithmic functionś definition problems.The second problem relates to the description of the interaction process at higher times tk > t1.Therefore, mm theory enables a simple calculation of the probabilities with which the drug-receptor interaction takes place in the time until the studied effect is achieved.In this work we are going to focus on drugs-microbial structure less target interaction.As depicted in figure 1, this model deals with the calculation of the probabilities ( k pij) with which any arbitrary molecular atom j-th bind to the structure less molecular receptor given that other atom i-th has been bound before; along discrete time periods tk (k = 1, 2, 3, …); (k = 1 in grey), (k = 2 in blue) and (k = 3 in red) throughout the chemical bonding system.The procedure described here considers as states of the mm the atoms of the molecule.The method arranges all the 0 θj(s) values in a vector θ (s) and all the 1 θij(s) entropies of interaction as a squared table of n x n dimension.After normalization of both the vector and the matrix we can built up the corresponding absolute initial probability vector φ(s) and the stochastic matrix 1 (s), which has the elements 0 pj(s) and 1 pij(s) respectively.The elements 0 pj(s) of the above mentioned vector φ(s) constitutes the absolute probabilities with which the j-th atom interact with the molecular target or receptor in the species s at the initial time with respect to any atom in the molecule (24,41,43): Where, m represents all the atoms in the molecule including the j-th, na is the rate of occurrence of any atom a including the j-th with value nj.On the other hand, the matrix is called the 1-step drug-target interaction stochastic matrix. 1 (s) is built too as a squared table of order n, where n represents the number of atoms in the molecule.The elements 1 pij(s) of the 1-step drug-target interaction stochastic matrix are the binding probabilities with which a j-th atom bind to a structure less molecular receptor given that other i-th atoms have been interacted before at time t1 = 1 (considering t0 = 0) (18,24,41,43): By using, φ(s), 1 (s) and chapmankolgomorov equations one can describe the further evolution of the system. 10-17summing up all the atomic free energies of interaction 0 θj(s) pre-multiplied by the absolute probabilities of drug-target interaction a pk(j,s) one can derive the average changes in entropies k θs of the gradual interaction between the drug and the receptor at a specific time k in a given microbial species (s) (24): Such a model is stochastic per se (probabilistic step-by-step atom-receptor interaction in time) but also considers molecular connectivity (the step-by-step atom union in space throughout the chemical bonding system).

Statistical analysis
As a continuation of the previous sections, we can attempt to develop a simple linear QSAR using the MARCH-INSIDE methodology, as defined previously, with the general formula:  .....
Here, k θs act as the microbial species specific molecule-target interaction descriptors.The calculation of these indices has been explained in supplementary material by space reasons.We selected Linear Discriminant Analysis (LDA) to fit the classification functions.The model deals with the classification of a set of compounds as active or not against different microbial species(43).A dummy variable (Actv) was used to codify the antimicrobial activity.This variable