A novel approach to Peptide Vaccine Design for Ebola virus

The Ebola viral disease with high fatality ratio is making a comeback in the Democratic Republic of Congo (DRC), after its rampage in West Africa in 2014-16, that has spawned fears of leading to a pandemic. Vaccines such as the experimental rVSV-ZEBOV has provided protection in 70-80% of the cases, but such vaccines are in short supply and doubts exist of its availability and sustainability in pandemic cases. Peptide vaccines promise to amend this lacuna as a chemical construct that can be scaled up to requirement in manufacturing set-up, are easy to produce in pure form and store as well as transport much more easily and economically than traditional vaccines. Although no peptide vaccines have been licensed yet for human use, the rapid growth of applications of in silica approaches to peptide vaccine design and application to a myriad of virus infections, and subsequent follow-up experimental work, have led to expectations to licensures in the near future. We have proposed a protocol to automate the search procedure for suitable peptide vaccines using mathematical and computational modelling approaches that ensure long life of such vaccines even in the face of rapid mutational changes in viral sequences. In this paper we outline the mathematical model we have used and the recent improvements in the techniques to ensure the best recommendations for peptide vaccine libraries, especially against the Ebola virus that threatens to spill over the Congo border and cause epidemic and pandemics in a globalized world. .


Introduction
Current efforts at computational approaches to develop drugs and vaccines have taken on urgency in view of the fact that viral epidemics have been occurring with increasing frequency in recent years. In the 21 st century, the epidemics of SARS (Severe Acute Respiratory Syndrome) in 2002-03 [1], H1N1 swine flu of 2009 [2], the Ebola virus crises in 2014-16 in West Africa and its latest manifestation in the Democratic Republic of Congo (DRC) [3], the Zika rampage of 2015-16 [4] and near-epidemic of the Nipah virus in South Asia [5,6] have caused high fatality, strained public health resources and led to significant economic damages. In all these cases the virus was either new or a mutated progeny of an earlier form and for which no drugs or vaccines existed. Traditional methods of developing new drugs and vaccines cost billions of dollars and take several years of lead time [7][8][9], which clearly is inadequate to contain viral epidemics that generally die out in 6 to 18 months.
Advances in computational technology, immune-informatics, genetic databases and related sciences in recent years have led to the development of the science of vaccinomics that relates to the understanding of how vaccines work and how focused and technological approach can be taken to accelerate the process of vaccine design [10,11,12}. Peptide vaccines where surface proteins of a virus particle, the virion, is checked for its potential to elicit the best immune response, is one product of the vaccinomics approach [13]. The availability of adequate viral sequence data and novel programming techniques that bypasses alignment procedures have enabled development of new and fast protocols for in silico approaches to peptide vaccine design. We have used these novel alignment-free techniques and protocols ( Fig.1) to design in silico peptide vaccine libraries for Influenza [14,15], Rotavirus [16], Human Papilloma virus [17] and Zika virus [18], but the methods require more robust software approaches; we are in this paper very briefly reporting one such modification to our standard approach; another approach that separates viral sequences originating from different geographical areas using principal component analysis (PCA) and self-organizing maps (SOM), which holds promise of good separation between different segments also, has been recently communicated [19]. We use the Ebola virus (EBOV) as an example for application of the modified technique. Till now many potential vaccine candidates have been developed. But none have been approved for clinical use in human. But rVSV-ZEBOV has been extensively used in 2018-2019 on a compassionate use protocol [21] and reported 70-80% effectiveness against EBOV. Recently a new Ebola virus vaccine has been prequalified by the WHO for use in the DRC, but the results are awaited. Classical vaccines though used widely (but not very effective against a highly pathogenic virus like EBOV) have certain disadvantages; viz., 1) The supply is very less compared to their demand and 2) Classical vaccines are effective for only a particular strain of the virus and cannot be used for the same virus if it mutates to a new strain. On the other hand, newly developed peptide vaccines which are peptides serving to immunize an organism against a pathogen [22] is advantageous as it can meet the supply demand ratio, is cost effective and can be easily modified to suit mutated viral strains. In this paper we very briefly report on our efforts to design suitable peptide vaccines for the Zaire Ebola virus, and also take this opportunity to record our new modification to determining suitable prospects for peptide vaccines by a quantitative method.

Results and Discussions
The EBOV genome is a single-stranded, negative sense RNA 19000 nucleotides long [23]. The virion of EBOV has a virally encoded glycoprotein (GP) projecting as spikes (hence known as Spike Glycoprotein) from its lipid surface. EBOV can enter through two pathways. They are Niemann-Pick C1 (NPC1) and Hepatitis A Virus cellular receptor 1(HAVcr-1). The HAVcr-1 binds to the Spike glycoprotein present in the envelope of Ebola virus.
The available Zaire Ebola glycoprotein sequences that met our criteria defined in the Materials and Methods section were subjected to protein segment variability profile examination described below. The sequences were further tested for their ASA index and the two indices plotted against amino acid position number. This enabled a preview of the most likely conserved, surface exposed segments, most suitable for immune response. Applying the new Polygon representation algorithm, we got a more comprehensive quantitative assessment of the regions where protein variability is the least and matched well with high solvent accessibility.
These regions were then to be compared with the 3D structure of the EBOV to ensure that surface protein segments are not covered by neighbouring proteins and segments. Unfortunately, the only crystal structure of the Zaire EBOV available in the Protein Data Bank (PDB), 5HJ3.pdb, shows only a limited part of the protein structure and so our comparison task was rather constraint, but what was available showed that the segments covered by our investigations and falling within the protein segment covered by the crystal structure, were clearly surface exposed. Fig.2 shows one such segment predicted by the IEDB analysis described hereafter; the segment is marked in yellow in the cleft on the right hand side centre of the figure.
Having a list of peptide segments that matched our criteria, we next turned to the IEDB (Immune Epitope Analysis Resource) to determine if any of these had good epitope potential. Table 1  of the first six high ranking epitopes determined by a MHC II analysis for B-cells with HLA profiles from Africa that covered 90% of the listed communities. Column 1 lists the start and end of the contiguous segment that ranked high in the IEDB results, column 2 shows the peptide, while column 3 lists the segments determined by our own analyses using the Polygon Representation. The matches are quite close. The available crystal structure did not allow representation of our peptides, but one segment predicted by IEDB analysis is shown in Fig.2 for representational purposes.  There are several more steps that need to be completed before the final list of possible vaccine candidates can be presented for wet-lab analysis. Our work so far has been to predict linear epitopes; we have to analyse for non-linear epitopes as well as for the T-cell interactions. We also need to use other web servers to support or negate the conclusions we need to draw from these analyses. Last but not least, we need to take the semi-final list of probable peptides and eliminate one or more of them that can lead to auto-immune diseases. Then finally we can draw up a list of best probable epitopes from amongst the list of peptides thrown up by our analyses so far. This work is in progress.

Materials and Methods
The protein sequence data of all Zaire Ebola viruses available in the NCBI GenBank database were downloaded and only those that were complete and reasonably annotated were filtered for the glycoprotein (GP) sequences for further analysis. http://sciforum.net/conference/mol2net-05 As per our protocol, the methodology required an in-house software to determine the conserved segments in the GP and a number of external servers to analyse various attributes. For the protein conserved segment search, we consider a protein sequence numerical characterization technique [24]. Since there are 20 different amino acids, so, in abstract space, a hypothetical 20D coordinate system can be considered where each amino acid is assigned one axis. As explained in detail in Ref [24], the sequence is read considering one amino acid at a time and the corresponding points are plotted by moving unit-step forward in respective directions. Likewise, the weighted averages along the axes are given as µ1 = Ʃ(x1i)/N, µ2 = Ʃ(x2i)/N, ……, µ20 = Ʃ(x20i)/N. Now one can define a protein graph radius (pR) as the magnitude of the resulting vector (µ1, µ2, ……, µ20) obtained, which is given as pR = √ (µ1 2 + µ2 2 + ……….. + µ20 2 ). This value of pR can be used to numerically characterize amino acid sequences with minor adjustments (see ref [16]) to yield a value that is characteristic of a given sequence, which implies that two sequences having the same pR must be identical. This is the core of the conserved segment search method.
To determine conserved segments in the protein sequence, we select a window of 12 amino acid length, directed mainly by the MHC I and II pocket size, and determine the pR value of the window stretch, then shift the window by one amino acid at a time and determine the new pR value and so on till the end of the sequence. Doing this for all the sequences in our database, we can analyze each window's pR values. In any window, some sequences will have the same sequence structure and so will have the same pR value. We can scan all the sequences in each window to determine how many varieties of values are there: the lower the number of varieties, the more conserved is the segment. The next step is to find the average solvent accessibility (ASA) of each residue in the sequences through an appropriate server; we use SABLE and iTASSER. The ASA profile is then compared to the protein segment variability profile to determine those segments where the protein variability is the least and hydrophilicity index is the highest that imply regions that are most conserved and exposed to solvent. Both profiles are smoothed out by a moving average of 12 aa to make following the graphs easier.
So long we had been doing this important step of identifying the conserved surface exposed regions by eye estimation. In this paper we report a 3D Polygon Representation method where this is automated. We represent on the three axes the parameters ASA, 1/PV and ASA-PV where PV is the protein variability index. At any amino acid position number on the sequence, we can get three parameters from which we can draw a triangle and get an area as a number to compare between the values at other amino acid position in the sequence (Fig.3). The higher the area, the closer we will be to the ideal of most conserved, most surface exposed segments of the virion protein sequence. The method was tested against our earlier work with rotaviruses. While the new method identified regions that included those we had identified by eye estimation, it showed a few more segments that could have been used, and also overlapped our earlier identified regions. This gave us confidence to utilize the program for this important Ebola virus peptide design work.

Conclusion
To summarise, we have outlined in this paper a novel methodology to determine quantitatively a set of peptide segments that are highly conserved and also highly exposed to the solvent so that they can act readily as Ebola virus vaccine candidates. The software algorithm we have outlined in this paper is quite novel and has proved useful in analyzing the Zaire Ebola viruses. This novel approach has been compared with our earlier work on rotaviruses and shown that while supporting what we predicted several years ago still works, but the new algorithm predicts a few more peptide segments that we had missed in the eye estimation work. This kind of analysis will provide a better foundation for future analysis for peptide vaccine designs.
In addition to the novel kind of analysis reported in this paper for the computer-assisted design of Ebola virus peptide vaccine libraries, another line of our recent computational biology research, based on alignment-free sequence descriptors, principal component analysis (PCA) and self-organizing maps (SOM) [19], clustered a diverse collection of viral sequences into homogeneous subsets predominantly present in distinct geographical regions of the world. Results of such research could be used in future analyses of heterogeneous collections of viral strains.

Conflicts of Interest
The authors declare no conflict of interest.