Design and in-vitro testing of new antimicrobial peptides based on QSAR modelling

Antimicrobial peptides (AMPs) are anti-infectives that may represent a novel and untapped class of biotherapeutics. In the lab of bioinformatics of IBCEB the Database of Antimicrobial Activity and Structure of Peptides (http://dbaasp.org) has been developed. Contrary to available approaches, we think that strategy of AMP prediction should be based on the fact that there are at least four kinds of AMPs for which four independent algorithms of prediction have to be developed in order to get high efficacy. We can distinguish linear cationic antimicrobial peptides (LCAP), cationic peptides stabilizing structure by intra-chain covalent bonds, proline and arginine-rich peptides, and anionic antimicrobial peptides. Simple predictive model which can discriminate AMP from non-AMP has been developed for LCAP. As descriptors the sequence-based physical-chemical characteristics responsible for capability of the peptide to interact with an anionic membrane were considered. The algorithm was based on the clusterization of AMPs by their physicochemical properties. The results show that descriptors relied mainly on hydrophobic and hydrophilic features allow us to predict AMP with the high accuracy. The developed predictive model was used to design new peptides. Antimicrobial potency of these peptides have been evaluated by in vitro testing of peptides’ activity against different bacteria (including drug resistant strains). In-vitro estimation shows high accuracy of the developed predictive model.


4
 Interest for AMP has increased and the rate of discovering new peptides (natural and synthetic) is very high  The majority of the new peptides are artificial and have been created in the studies of structure-activity relationships  Task-oriented rational ab initio design, which means sequence-based computational prediction of antimicrobial properties, with the subsequent experimental synthesis and testing of peptides for antimicrobial activity, is the most cost-effective way of developing novel antibiotics against drugresistant bacteria

Introduction (continue)
 Most of the algorithms for AMP prediction do not take into account variation in mechanisms of action, structure, mode of interaction with membrane and other differences  There are at least four kinds of AMPs: linear cationic antimicrobial peptides (LCAP), cationic peptides stabilizing structure by intra-chain covalent bonds (CICP), proline and arginine-rich peptides (PAP), and anionic antimicrobial peptides (AP)  Consequently, our approach suggests that the development of four independent algorithms of prediction is necessary to get high efficacy

Introduction (continue)
6  Most predictive methods do not distinguish target species during the model development  Such approach does not consider several issues, which can influence the accuracy of prediction.These issues are the following:  Antimicrobial potency of AMPs strongly depends on bacterial membrane types and thus on particular targets  There are limited data for negative set, because there are practically no data for peptides, which have not antimicrobial potency for all target species.So they use randomly selected set  Our approach is the use of antimicrobial potency for specific targets for which there is a large number of data.
 There have been described several different mechanisms of action of AMP.Consequently, it's reasonable to assume the existence of different types of peptides with different physical-chemical features.
 We had to take into account all these issues and decided to develop predictive model relying on machine learning approach using clustering algorithm at the optimization of model.

Introduction (continue)
 Stabilization of peptide structure in the membrane environment is mainly driven by intramolecular hydrogen bonding  So it's reasonable to assume that the membrane will impel the linear peptide to a regular α-helical conformation.This assumption is supported by the fact that all transmembrane domains of membrane proteins are mainly helical. Therefore the properties of peptides related to the three-dimensional structure, such as a helical hydrophobic moment, an orientation of peptides relative to the surface of membrane, the penetration depth, etc can also be evaluated using only sequence information

Physico-chemical properties selection for predictive model
 Both AMP and lipid bilayer are amphipathic  So the main parameters which should construct the physical-chemical characteristics of AMPs, responsible for the interaction with the membrane have to be ionic charges and hydrophobic features of the side chains of amino acid residues and also a depth dependent potential of insertion of amino acids into membrane  Different combinations (mainly linear) of these elements allow us to evaluate physical-chemical properties of the peptides such as hydrophobicity, charge, isoelectric point, propensity to disordering, a linear hydrophobic moment, etc The following 8 properties were used in QSAR study: Definition of the characteristics can be found here [2] Physico-chemical properties selection for predictive model (continue)  Escherichia coli ATCC 25922 has the most data in DBAASP  Our approach is based on the data for this species Assessment of likeness of behavior of a particular set of peptides against different microbial cells on the basis of DBAASP data  For the assessment, pairs of microbes were screened and peptides were selected so that susceptibilities for both of the microbes are known for each.For a selected set of peptides the ranking of susceptibilities (MIC) for each microbe was performed and Spearman's rank correlation coefficient was calculated.Spearman's rank correlation coefficients for particular pairs of target organisms are presented in


The main objectives of the work were: Development of a new predictive model for linear AMP being active against certain species relying on QSAR (Quantitative Structure-Activity Relationship) study  Design new antimicrobial peptides based on the proposed predictive model  In-vitro test of the de novo designed peptides against bacterial species Hydrophobic moment (M)  Hydrophobicity (H)  Charge (C)  Isoelectric Point (I)  Penetration Depth (D)  Orientation of Peptides relative to the surface of membrane (O)  Propensity to Disordering (R)  Linear Moment (L)

Length 1 -(• Set 1 - Set 3 -
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Number Distribution of peptides length of ribosomal AMP active against Escherichia coli ATCC 25922 Majority of peptides have length within intervals 10-16 Positive set is formed on the basis of condition MIC <25 mg/ml  Negative set is formed on the basis of condition >100 mg/ml  Sets were performed with the following restrictions:  Sequence length 10-16 amino acids  Without intra-chain bonds  Without unusual amino acids Machine-learning approach relied on density-based clustering algorithm DBSCAN [3] was used Training and test sets were performed from DBAASP.Algorithm is based on the following full sets of peptides  Full Positive Set (FPS) -160 peptides  Full Negative Set (FNS) -146 peptides  For validation purposes 5 training sets (positive and negative) were created from full sets.Each positive and negative set consists of 125 randomly selected peptides from FPS and FNS respectively.Clustering of positive sets has been performed in the 255 different space of characteristics.(Number of combinations of 8 characteristics = 255).Definition on the next slide) Optimal clusters are defined by the following parameters: • Number of characteristics (dimensionality of the space where cluster has been determined.For the example in the figure = 3 ) Types of characteristics (Type of space.For the example in the figure M, C and I characteristics determine the space) • DBSCAN clusterization parameters e, minpts (m) were N + i -number of peptides from positive set in the ith cluster, N +full number peptides in the positive set, N f i -number of peptides from positive and negative sets in the ith cluster M, C, I, e, m, p i Parameters of optimization : 255 sets of characteristics; Parameter of clasteriztion e, m; P i Validation of the results of optimization.Optimization by clustering took place for each of 5 pairs of randomly selected training sets.Clusters which were continuously appeared during optimization of different training pairs are chosen for the further processing Schematic representation of validation on 5 random training sets In each training set non-overlapping cluster combinations (without common peptides) were generated  The best non-overlapping cluster combination with maximum p i was selected for AMP prediction Schematic representation of the results of optimization on the training been formed in the space of characteristics M, H, I, D Optimization reveals 4 non-overlapping clusters of 4 different N-mer spaces.Cluster I has been formed in the space of characteristics M, H, C, I, D Cluster III has been formed in the space of characteristics M, H, C, I, R Testing of the predictive model For in silico test the following 3 sets have been used: Escherichia coli ATCC 25922 -21 peptides in positive set and 21 peptides in negative set  Set 2 -Pseudomonas aeruginosa ATCC 27853 -109 peptides in positive set and 109 peptides in negative set Klebsiella pneumoniae -29 peptides in positive set and 29 peptides in negative set Pseudomonas aeruginosa ATCC 27853 and Klebsiella pneumoniae were used for creation test sets, because according to DBAASP data, AMPs behave similarly against the Escherichia coli, Pseudomonas aeruginosa and Klebsiella pneumoniae Peptide synthesis and in-vitro testing 11 designed peptides were synthesized  In-vitro tests of these peptides against Escherichia coli ATCC 25922 were carried out  10 from 11 peptides have MIC<25 mg/ml  Preliminary data show that these peptides are active against some pathogenic gram-negative strains of P. aeruginosa, A. baumannii, E. cloacae (including FQR, NDM, Meropenem resistant)Predictive model for AMP based on the clusterization of their physicochemical properties was developed  Four non-overlapping clusters with different sets of physicochemical properties were obtained for the prediction of AMP against -Escherichia coli ATCC 25922. In-silico test of predictive model shows accuracy about 0.8 for Escherichia coli ATCC 25922 and Pseudomonas aeruginosa ATCC 27853.Accuracy in case of Kleebsiella pnemoniae is lower (~0.6). 11 peptides were designed based on the data of physicochemical properties from cluster I.  In-vitro tests of the 11 designed peptides were performed against Escherichia coli ATCC 25922  The obtained results show that 10 from 11 peptides have high antimicrobial potency.

Distribution of four types of AMPs in DBAASP Linear
peptides are the larger class of AMPs for which there are the most data in DBAASP and we have developed predictive algorithm for this AMP class

Table Target
Escherichia coli ATCC 25922 -125 peptides in positive set and 125 peptides in negative set Test set 1 -Escherichia coli ATCC 25922 -21 peptides in positive set and 21 peptides in negative set Test set 2 Pseudomonas aeruginosa ATCC 27853 -109 peptides in positive set and 109 peptides in negative set Test set 3 -Klebsiella pneumoniae -29 peptides in positive set and 29 peptides in negative set * ND -not determined because there is not enough data Training set -