SciForum MOL 2 NET Molecular descriptor from atomic weighted vectors to predict aquatic toxicity .

Molecular descriptors from atom weighted vectors (MD-AWV) are used in the prediction of aquatic toxicity of a large group of organic compounds of 392 benzene derivatives to the protozoo ciliate Tetrahymena pyriformis (log (IGC50) ). These descriptors are calculated using the MD-LOVIs software and various Aggregation Operators are examined with the aim comparing their performances in predicting aquatic toxicity. Variability analysis is used to quantify the information content of these molecular descriptors by means of an information theory-based algorithm. Principal Component Analysis (PCA) is used to analyze the orthogonality of these descriptors and it is observed that MDAWV provide linearly independent information from that of descriptors generated using the popular DRAGON package (0-2D). Multiple Linear Regression with Genetic Algorithms is used to obtain models of the structure–toxicity relationships; the best model shows values of Q= 0.830 and R=0.837 using six variables. Our models compare favorably with other previously published models that use the same data set. The obtained results suggest that MD-AWV provide an effective alternative for determining aquatic toxicity of benzene derivatives. __________________________________________________________________________________


Graphical Abstract: Introduction:
Benzene derivatives are widely used in industry as pesticides, insecticides, herbicides, lubricants, detergents, polymers, solvents, and in the manufacturing of plastics, resins and nylon [1][2].Likewise, in the pharmaceutical industry, they are often used as drugs.
Many of these derivatives can cause damage to the environment, humans, animals, and plants due to their toxicity.Environmental contamination, in addition to the possible accumulation of substituted benzenes in water and soil, make them potentially damaging chemicals [1,3].There is an increasing interest in the toxicity of environmental chemicals.Quantitative structureactivity/toxicity relationships (QSAR/QSTR) studies provides an invaluable tool in the prediction of aquatic toxicity directly from the molecular structure of compounds [3].The inhibition of growth database [4] of the ciliated protozoan Tetrahymena pyriformis is considered to be a high quality dataset [5][6].It has been developed in a single laboratory over more than two decades.Several studies have attempted to predict the toxicity of benzene derivatives toward T. pyriformis using the known toxicity database of 392 benzene derivatives [3,[7][8].Several other applications of QSAR in toxicology are presented in comprehensive reviews [9][10][11][12].The aim of this paper is to compare the performance of a new family of molecular descriptors (MD), derived from Atomic Weighted Vectors (AWV) [13], in predicting the aquatic toxicity of a large group of substituted benzenes.These MD are derived from AWV using Aggregation Operators (AOs) to convert the AWV into scalar quantities describing aspects of the molecule.These MD-AWVs are then evaluated for their applicability in ecotoxicological research.

Materials and Methods:
A chemical dataset of 392 benzene derivatives tested in T. pyriformis was used to create and assess models of aquatic toxicity.The data comprise diverse structural substituted benzene molecules containing nitrobenzenes, phenols, aminobenzenes, and benzenitriles.For predictive model development, we make use of the same training set and test set previously published [7].
An AWV is a representation, in n space, of the weights of all atoms within a structure.Here, represents the real number space and n represents the number of atoms within the molecular structure represented by the AWV.X = [X1, X2, …., XN] ϵ N (1) On the other hand, an AO is a function which gives a real number y to an n-dimensional vector of real numbers [X1, X2,…, XN] [13]: Y = AO ([X1,X2,….,XN]) (2) The information codified in a weighted MS may be estimated in an ℝ n space, where ℝ a set of real numbers and N is the number of atoms in a molecule structure.If the molecular vector W, in the ℝ n space is considered to start from the origin, then the vector components represent given atom, atom-type or group properties [13].This MD were implemented in a software package called MD-LOVIs (Molecular Descriptors from LOcal Vertex Invariants and Related Maps), which is available in http://www.tomocomd.com.

Results and Discussion:
The PCA (Principal Component Analysis) results suggest that the MD-AWVs codify different information than that provided by the DRAGON (0-2D) descriptors.On the other hand, despite the existence of some overlap between the MD-AWV and the DRAGON (0-2D) descriptors, many of the new MD-AWVs are orthogonal to the descriptors implemented in the DRAGON software.This permits us to say that our models encode new chemical information from the molecular structure.These results suggest that the correlations obtained with MD-AWV-based models for the prediction of aquatic toxicity can be considered to be statistically significant despite their simplicity.Our results were compared with other approaches that use the same dataset and our model with less attributes achieved similar to better statistical parameters than other models.

Conclusions:
The results obtained with the MD-AWV models were compared to several QSAR procedures reported in the literature according to the correlation coefficients achieved with the leave-one-out crossvalidation (Q 2 loo) and square coefficient relation (R 2 ) methods.Generally, performance was observed to be highly competitive with the state of the art.
The obtained MD-AWV-based models were demonstrated to be effective in terms of the R 2 and Q 2 loo values.The model developed with MD-AWV for six variables is showed below