Combined Use of Information Entropy and Bepipred Scores for Screening Ebola Virus Glycoprotein ( GP ) Sequences

Unprecedented epidemics caused by strains of Ebola virus (EBOV) are associated with extremely high mortality rates. Because EBOV glycoprotein (GP) is the precursor of the proteins mediating the binding and internalization of the virus, EBOV GP is an attractive target for vaccine development. It is reported here that by means of combined use of information entropy (H) and Bepipred predictive epitope screening of GP amino acid sequences, four invariant oligopeptide sequences were identified in GP of EBOVZ (EBOV Zaire) and EBOVS (EBOV Sudan), the two most epidemiologically prominent strains of EBOV. Three of these four oligopeptide sequences were subsequently identified in the GP protein of the Reston strain of EBOV (EBOVR), with a GP sequence length different from that of GP EBOVZ and GP EBOVS. It is suggested that the invariant oligopeptides identified by the combined information entropy and B-cell linear epitope screening procedure may be useful as components of oligopeptide anti-EBOV vaccines.


Introduction
A bioinformatic analysis of Ebola virus (EBOV) glycoprotein (GP) [1] is presented here based upon the combined distributions of information entropy [2] and predicted B-cell epitope score [3] in full length GP protein sequences.Most recognized cases of human infection with EBOV have been caused by EBOVZ (Zaire) and EBOVS (Sudan) strains of EBOV [4,5].Accordingly, this study focuses on GP amino acid sequences of those two EBOV strains.For comparison, an oligopeptide sequence search analysis was conducted on the GP protein of EBOVR (Reston); Ebola virus.EBOVR is a strain of EBOV that has not been reported to cause disease in humans [6].

Figure 1. Immunobioinformatic Analysis of Ebola Virus (EBOV) Glycoprotein (GP).
(top) H distributions in EBOVS (black) and EBOVZ (red) GP sequence datasets; (middle) H distribution in the set of combined {EBOVS, EBOVZ} GP sequences; (bottom)  is the conditional Bepipred score of the combined {EBOVS, EBOVZ} GP sequences.The receptor binding domain (RBD) common to both sets of GP sequences extends between the vertical red lines at amino acids 54 and 185.
The H distributions in the GP(EBOVS) sequence set and in the GP(EBOVZ) sequence set are shown in Figure 1, top.The summed total entropy values (H) for these GP sequences were: H(EBOVS) = 36.3776)and H(EBOVZ) =26.6990, respectively.These two H values are indistinguishable (Z = 1.2300, p = 0.2187).However, although the patterns of distribution of H in the two GP datasets are similar, they clearly are not identical.The H distribution in the combined GP(EBOVS, EBOVZ) dataset is shown in Figure 1, middle.The summed H value obtained for the combined GP datasets was: H(EBOVS, EBOVZ) =132.1436.This increased summed entropy value was significantly greater than that obtained for either the EBOVS GP dataset (Z = 7.6884, p= 1.4895e-14) or the EBOVZ GP dataset (Z=8.3938,p =4.7084e-17).
The distribution of screened Bepipred scores () in the EBOV GP sequences shown in Figure 1 (bottom) revealed four clusters of amino acids that fulfilled the  epitope screening criteria.Within those clusters, four oligopeptides, each consisting of at least six contiguous amino acids were identified in both EBOVZ and EBOVS GP proteins.(Table I).As shown in the table, three of those four GP oligopeptides were also identified in the EBOVR GP protein.

Table 1. Distribution of Predicted Epitopes in GP Protein of Strains of Ebola Virus (EBOV).
Amino acid positions of epitopes for each EBOV strain are given by column.Absence of the predicted epitope is indicated by the dash (-). Predicted

Discussion
The glycosylated GP protein of the EBOV virus is cleaved into a GP1 protein and a GP2 protein [7,8].The GP1 protein is responsible for the binding of the EBOV virus to the membrane of the target cell.The GP2 protein is responsible for internalization of the virus into the target cell.Thus, GP protein is a reasonable subject for the design of a protective and ameliorative anti-EBOV vaccine.
By means of the combined use of a bioinformatic metric (H) and a screened Bepipred score (), four GP peptides were identified as potential antigens in all of the complete EBOBZ and EBOVS GP sequences in the NCBI Genbank dataset.Three of these peptide sequences, a heptapeptide, an octapeptide and a decapeptide, reside within the receptor-binding domain (RBD) of the GP1 protein.
The fourth identified peptide sequence, a hexapeptide.lies near the tail-region of the GP2 protein.The immunologic activity predicted in this current study differs from that of a set of antigens previously reported for EBOV GP [9].Two of the predicted epitopic oligopeptides (FRSGVPP and HDWTKN) lie completely outside known antigenic regions in humans, except for the single N-terminal proline [10].In contrast, the octapeptide (YEAGEWAE ) and decapeptide (KKPDGSECLP) predicted epitopic oligopeptides lie within larger domains of GP proteins reported to be recognized by human antibodies in recovered EBOVZ patients and/or survivors [10].Thus, these four predicted oligopeptide epitopes help detect and further define regions of the GP protein with immunologic potential.
Proline has been reported to play a significant role in Ebola antigen structure and function [11].Proline seems to have played a significant role in the identified EBOV GP oligopeptide epitopes reported here.Two RBD oligopeptides with predicted epitopic potential, GP(88-94) and GP(114-123) each contains two proline residues.Proline has been reported to be a common component of epitopecontaining segments of proteins, in association with specific effects of proline on protein secondary structure [12,13].
Significant progress has been made towards a nucleotide-based anti-EBOV vaccine [14].An oligopeptide based anti-EBOV vaccine would not require synthesis of the viral antigen in the vaccine recipient.This may shorten the response time to the vaccine, which would be a significant advantage given the brief incubation period and high mortality of EBOV in humans.

Experimental Section
The complete sets of EBOVZ (N=143) and EBOVS (N=10) GP protein sequences were downloaded from the NCBI Genbank [15] in FASTA format [16] on August 15, 2014.All of the GP sequences were full length, ie, 676 amino acids long.A set of EBOVR (Reston) sequences (N=9) was also downloaded, in which protein sequence full length was 677 amino acids.
Computation of information entropy (H) [2] and Z-tests on those computations were performed with the Enthought Canopy 1.4.1 distribution of 64-bit Python 2.7.6.Those computations were performed for both of the individual sets of GP amino acid sequences and for the combined set of EBOVS and EBOVZ GP sequences.H values were statistically evaluated by Z-test, using 1000 pseudo-random trials.
A consensus sequence for the combined GP sequence sets was determined with Jalview [17] and a B-cell predicted epitope score (B) was determined for each amino acid position of that consensus with Bepipred [3].The B epitope score at each amino acid position was screened, conditional upon the following two criteria: criterion #1 H = 0 at the selected amino acid position; criterion #2 B  0.350, which is the Bepipred recommended cutoff,.
In this paper, the screened epitope B Bepipred score is denoted as .

Conclusions
On the basis of the combined use of information entropy (H) and B-cell epitope score, four invariant oligopeptide epitopes, common to EBOVZ and EBOVS GP proteins were identified.Two of the identified oligopeptides each contain two proline residues.It is proposed that structural studies of these peptides can increase our understanding of EBOV structural biology and that vaccines based upon these oligopeptides may be useful for preventing and ameliorating disease caused by Ebola virus.