QSPR-Perturbation Models for the Prediction of B-Epitopes from Immune Epitope Database : An Interesting Route for Predicting “ in silico ” New Optimal Peptide Sequences and / or Boundary Conditions for Vaccine Development

In the present study, three different physicochemical molecular properties for peptides were calculated using the program MARCH-INSIDE: atomic polarizability, partition coefficient, and polarity. These measures were used as input parameters of a Linear Discriminant Analysis (LDA) in order to develop three different quantitative structure-property relationship (QSPR)perturbation models for the prediction of B-epitopes reported in the immune epitope database (IEDB) given perturbations in peptide sequence, in vivo process, experimental techniques, and source or host organisms. The accuracy, sensitivity and specificity of the models were >90% for both training and cross-validation series. The statistical parameters of the models were compared to the results achieved with the electronegativity QSPR-perturbation model previously reported. The results indicate that this type of approach may constitute an interesting route for predicting “in silico” new optimal peptide sequences and/or boundary conditions for vaccine development.


Introduction
The immune epitope database (IEDB) contains data related to antibody and T cell epitopes for humans, non-human primates, rodents, and other animal species (1).This system registers an important amount of information about the molecular structure and the experimental conditions (cij) in which different i-SciForum http://sciforum.net/conference/mol2net-1th molecules were determined to be immune epitopes or not.
Quantitative structure-activity/property relationship (QSAR/QSPR) methods let transform molecular structures into numeric molecular descriptors (λi) and find relationships between these structures and their biological activity.On the other hand, perturbation theory comprises methods that add "small" variation terms to the mathematical description of problems with known solutions in order to find an appropriate solution for related problems with no known solutions.
In a recent work, Gonzá lez-Dí az et al.
(2) have developed an electronegativity QSPRperturbation model for B-epitopes reported in IEBD able to predict the probability of occurrence of an epitope after a perturbation in the peptide sequence (mi), source organism (so), host organism (ho), immunological process (ip), and experimental technique (tq) used.In principle, there are more than 1,600 different molecular descriptors (λi) that may be generalized and used to solve QSPR problems in chemical structures (3).In the present study, three different physicochemical molecular properties for peptide sequences reported in IEDB were calculated in order to develop three different QSPR models able to predict the efficiency of a new peptide as B-epitope given perturbations in mi, so, ho, ip, and tq.

Results and Discussion
In the present work, three different QSPRperturbation models were developed, one for each class of molecular descriptor calculated with the software MARCH-INSIDE (Table 1).In these equations, N is the number of cases used to train the models, RC is the canonical correlation coefficient, and U is the Wilk's lambda or Ustatistic.In line with Gonzá lez-Dí az et al. (2), the output of the models λ(εij)new is a real value function that scores the propensity with which a new peptide obtained after perturbation of the initial conditions acts as B-epitope.On the other side, the first input term λ(εij)ref is the scoring function λ of the efficiency of the initial process εij.The function λ(εij)ref = 1, if the i-th peptide could be experimentally demonstrated to be a Bepitope in the assay of reference (ref) carried out in the conditions cj. λ(εij)ref = 0 if otherwise.The perturbation terms Δλcj = λ(mq)refλ(mi)new are the difference in the mean value of the molecular property in question for all amino acids in the sequence of the peptide of reference.The quantify values of the conditions of the new assay cj-new that represent perturbations with respect to the initial conditions cij-ref of the assay of reference.The quantities * λ(cij) and * λ(cqr) are the average values of the mean values λ(mi) and λ(mq) of the molecular property in question for all new and reference peptides in IEDB that are epitopes under the j-th or r-th boundary condition.
The models obtained here are very stable and robust, yielding values of accuracy, sensitivity and specificity > 90% for both training and cross-validation series.These models are not able to improve the model developed by Gonzá lez-Dí az et al. (2).However, the results obtained are very similar and the values of different statistical parameters demonstrate the high significance of the models, validating the consistency of the method.Thus, the information obtained from the four different types of QSPR-perturbation models developed to date may be combined to increase the likelihood of a correct prediction of new epitopes or the optimization of known peptides towards computational vaccine design.http://sciforum.net/conference/mol2net-1

Materials and Methods
The same database recently utilized by Gonzá lez-Dí az et al. (2) was used in the present study.The calculation of the molecular descriptors was implemented in the in-house program MARCH-INSIDE (4), which makes use of a Markov Chain method to calculate the k-th mean values of different physicochemical molecular properties k λ(mi) for i-th molecules (mi) (5).In the present work, three new QSPRperturbation models for prediction of B-epitopes reported in IEDB were developed using different types of molecular descriptors λ(mi) to codify structural information: atomic polarizability (α), partition coefficient (P), and polarity (Pol).The construction of this type of models has been explained in detail before (2); therefore, only the general equation is presented: Here, λ(εij)new is the efficiency function as epitope of a new peptide obtained after a change in the structure and/or the boundary conditions cj ≡ (c0, c1, c2, c3… cn) of a peptide of reference.
The set of boundary conditions used here are the same reported in IEDB: c0 = the specific peptide; c1 = the organism that expresses the peptide (soj); c2 = the host organism exposed to the peptide (hoj); c3 = the immunological process (ipj); and c4 = the experimental technique (tqj).The variable λ(εqr)ref refers to a known efficiency function as epitope of a peptide of reference experimentally determined under a set of cj boundary conditions.The function λ(εij) was defined as a discrete value function for classification purpose: λ(εij) = 1 for epitopes reported in the conditions cj and λ(εij) = 0, when otherwise.The values c0 and dij are the coefficients obtained for the Linear Discriminant Analysis (LDA) classification functions.The variational perturbation terms ΔΔλijqr account both for the deviation of the molecular descriptors of all amino acids in the sequence of the new peptide with respect to the peptide of reference and with respect to all boundary conditions.The constant e0 represents the independent term of the model.http://sciforum.net/conference/mol2net-1An LDA was carried out using the STATISTICA 6.0 software (6).A forward stepwise strategy was used for variable selection, and the statistical significance of the models was determined by calculating the canonical correlation coefficient (Rc) and U-statistic.The accuracy, specificity, and sensitivity for the training and cross-validation series were also examined (7).

Conclusions
This work has demonstrated that atomic polarizability, partition coefficient, and polarity values calculated with MARCH-INSIDE seem to also be good molecular descriptors for finding QSPRperturbation models which are able to predict the results of variations in peptide sequences and experimental assay boundary conditions reported in IEBD.Consequently, this type of approach may constitute an interesting route for predicting "in silico" new optimal peptide sequences and/or boundary conditions for vaccine development.In addition, this study may serve as a basis for building better and more reliable models in the future (e.g., consensus QSPR models).This computational technique is by no means aimed at replacing experimentation but rather helps us to somewhat rationalize this process, while at the same time reducing costs in terms of material resources and time.http://sciforum.net/conference/mol2net-1 5.

Table 1 .
The best QSPR-perturbation models found in this work. .
Gonzá lez-Dí az, H.; Arrasate, S.; Sotomayor, N.; Lete, E.; Munteanu, C.R.; Pazos, A.; Besada-Porto, L.; Ruso, J.M. 2013b.MIANN models in medicinal, physical and organic chemistry.Curr.Top. in Med.Chem.13, 619-641.6. StatSoft.Inc.2002.STATISTICA (data analysis software system), version 6.0.www.statsoft.com.7. Hill, T.; Lewicki, P. 2006.STATISTICS: Methods and Applications: A Comprehensive Reference for Science, Industry and Data Mining.StatSoft, Tulsa.© 2015 by the authors; licensee MDPI, Basel, Switzerland.This article is an open access article distributed under the terms and conditions defined by MDPI AG, the publisher of the Sciforum.netplatform.Sciforum papers authors the copyright to their scholarly works.Hence, by submitting a paper to this conference, you retain the copyright, but you grant MDPI AG the non-exclusive and unrevocable license right to publish this paper online on the Sciforum.netplatform.This means you can easily submit your paper to any scientific journal at a later stage and transfer the copyright to its publisher (if required by that publisher).(http://sciforum.net/about).