Mol 2 Net Prediction of the Total Antioxidant Capacity of Food Based on Artificial Intelligence Algorithms

The growing increase in the amount and type of nutrients in food created the necessity for a more efficient use applied to dietetics and nutrition. Flavonoids are exogenous dietetic antioxidants and contribute to the total antioxidant capacity of the food. This paper aims to explore the data using different algorithms of artificial intelligence to find the one that best predict the total antioxidant capacity of food by the oxygen radical absorbance capacity (ORAC) method. A record of composition data based on the Database for the Flavonoid Content of Selected Foods and the Database for the Isoflavone Content of Selected Foods, was created. The KNN (K-Nearest Neighbors) and supervised unidirectional networks MLP (MultiLayer Perceptron) technics were used. The attributes were: a) amount of flavonoid (mean), b) class of flavonoid, c) Trolox equivalent antioxidant capacity (TEAC) value of each flavonoid, d) probability of clastogenicity and clastogenicity classification by Quantitative Structure-Activity Relationship (QSAR) method and e) total polyphenol (TP) value. The variable to predict the activities was the ORAC value. For the prediction, a cross-validation method was used. For the KNN algorithm the optimal K value was 3, making clear the importance of the similarity between objects for the success of the results. It was concluded the successful use of the MLP and KNN techniques to predict the antioxidant capacity in the studied food groups.


Introduction
In the recent years, database including information about the emerging food composition database were created [1][2][3].These databases are centred in the composition of bioactive substances including flavonoids.Flavonoids are present in several sources in the vegetal kingdom and display a large range of biological properties.They are already proved their benefits for health [4,5].Therefore, their study is a topic of interest.One of the most important activities is related to their antioxidant capacity [1,[6][7][8].
An antioxidant is a substance, even in small amounts comparing to the substrate, that is able to decrease the oxidation of that substrate [9].Furthermore, the antioxidant activity is correlated to the prevention of chronic diseases of high prevalence in different countries [10].
The food composition database of flavonoids has huge chemical information due to the structural diversity of the compounds included on it.This database provides researchers with new values on the flavonoid content of many foods in order to better ascertain the impact of flavonoid consumption against several chronic diseases [2,3].Flavonoids, particularly flavan-3ols, have been associated with the reduction in the risk of cardiovascular diseases by modulating different mechanisms of primary and secondary prevention [11].
This project was developed taking into account the possibility of generating predictive information related to the data found in the food composition database.In particular, we were looking for a tool to predict the antioxidant capacity of food containing different compounds with flavonoid scaffold (dietary exogenous antioxidants).This project was focus on the idea that a dietary antioxidant is a substance that significantly decreases the adverse effects of reactive species, such as reactive oxygen and nitrogen species, on normal physiological function in humans [12].
The data regarding the composition of food is complex and extensive [13].Therefore, it is hard to process all the information regarding the different assays presented in the literature.However, the processing of the information is still performed by classic statistical methodologies [14,15].When the problem is complex and mediated by non-lineal behaviours, it could be studied either by a multivariate perspective or by using artificial intelligence technics (AI) [16].In particular, artificial neuronal networks (ANN) are able to develop a predictive model that automatically includes relationships between the analysed variables, with no necessity of included them in the model.
In the biomedical field, several unidirectional supervised networks were used, specially based on the MultiLayer Perceptron (MLP) [16].However, as far as we know, these technics were never been used related to food composition databases.Therefore, the current work is centred in the development of an AI algorithm that allow the prediction of the total antioxidant capacity of the food, based on quantitative information, topologic-structural and the bioactivity of flavonoids.
The monomeric dietary flavonoids present in the studied data (class of flavonoid attribute) are from the chemical subclasses: flavonols, flavones, flavanones and flavan-3-ols (Table 1).Flavonoids from the anthocyanidin subclass can be found in several aliments.However, they were not included in this study due to their structure, which invalided the application of Topological Substructural Molecular Design, TOPSMODE approach [17].
In Table 2 it is shown the chemical structure, the SMILE codes and some examples of sources of the studied flavonoids.
Oxygen radical absorbance capacity (ORAC) was the studied parameter related to the antioxidant potential of the studied compounds.The results obtained in the predictions show that the assigned weights to each attribute were correct.
Figure 2 shows the obtained prediction by the KNN algorithm for the conjuncts # 1-5.X represents the number of rows in the database, in which everyone has an ORAC value represented in Y.In the graphics, it is possible to notice the correspondence between the predicted and the experimental ORAC values.The prediction resulted better when the method PSO+RST was used.
The results obtained with MLP algorithm (Figure 3) showed a less exact prediction.In this graphic it is possible to correlate the real and the predicted values.The ranges of error when algorithms were applied are: MAE (1.7601); RMSE (4.1569).

Conformation of the data related to the food composition
The information was obtained in different food composition database: a) database for the flavonoid content of selected foods, Release 3.1 (FDB 3.1) and b) isoflavones database released by the USDA in 2008 (IDB 2) [2,3].Therefore, it was used the estimation techniques for calculating unavailable values, and decision making procedure described by Bhagwat S et al [15].This information was used to prepare the register of the data related to the composition of flavonoids in different foods.The Standard Reference (SR) [5] was used to identify each unique food entry if it matches a food in SR.

Prediction using AI algorithms
Training set and test set.To obtain the training set and test set it was used k-fold cross validation method of k10 iterations [4].

Attribute selections and weight assignation.
To the attributes, different weights were assigned taking into account their influence in the attribute class: These experimental parameters were taken from the scientific literature.The variable selected (attribute class) to predict was the ORACexp value, expressed in μmolTE/100 g.ORAC was selected because it is considered to be the preferable methodology to evaluate the antioxidant capacity due to the biological relevance to the in vivo antioxidant efficacy [25].The assay has been used to measure the antioxidant activity of foods and measures the degree of inhibition of peroxy-radical-induced oxidation by the compounds of interest in a chemical milieu.
ORACexp and TPexp (mgGAE/100 g) for each substrate were found in the literature.The analytical method developed by Prior et al was used as the reference method for select published sources [26].
A different weight was assigned to each attribute using the measure of the quality of a similarity decision system.Weights were assigned manually and using the Particle Swarm Optimization+Rougt Set Theory method (PSO+RST) [21,22,27].PSO+RST was implemented in PROCONS software.

AI algorithms. KNN (K-Nearest Neighbors)
and supervised unidirectional networks MLP, MultiLayer Perceptron algorithms, were used.These algorithms were implemented in the PROCONS software version 4.0 [27]  (II)

ii) MLP, MultiLayer Perceptron
Units called neurons compose a neuronal network.Each neuron receives a series of entrances related to interconnexions and emits an exit.Furthermore the weights and connexions, each neuron was associated a transference mathematic function.This function generates the exit signal of the neuron based on the entrance signals.

Evaluation of the precision of the algorithms.
To evaluate the precision of the results obtained for both methods, they were used [28] Where: ai is the desirable exit value; yi is the value produced by the method and N is the numbers of objects.
In Figure 5 it is shown a general scheme of the methodology used in this study.The variable to predict: ORAC food value

Figure 1 .
Figure 1.Percentage represented by each alimentary group in the studied data.

Figure 5 .
Figure 5. Scheme of the applied methodology.

1 .K
Amount of flavonoid (mean) 2. Class of flavonoid 3. Trolox equivalent antioxidant capacity (TEAC) value of flavonoid, 4. Probability of clastogenicity 5. Clastogenicity classification by QSAR method 6.Total polyphenol (TP) value Preparation of the new register of the food composition data Consulting of BDCA for the flavonoid content of selected foods and database for the isoflavone content of selected foods

Table 1 .
Examples of the conformation of the data and the respective attributes.

Table 2 .
Examples of the chemical information of flavonoids and their presence in food contained in the studied database.