Please login first
Sequenced-based Discovery of Antibacterial PeptidesUsing Ensemble Gradient Boosting
* ,
1  Washington State University


Antimicrobial resistance is driving pharmaceutical companies to investigate different therapeutic approaches. One approach that has garnered growing consideration in drug development is the use of antimicrobial peptides (AMPs). Antibacterial peptides (ABPs), which occur naturally as part of the immune response, can serve as powerful, broad-spectrum antibiotics. However, conventional laboratory procedures for screening and discovering ABPs are expensive and time-consuming. Identification of ABPs can be significantly improved using computational methods. In this paper, we introduce a machine learning method for the fast and accurate prediction of ABPs. We gathered more than 6000 peptides from publicly available datasets and extracted 1209 features (peptide characteristics) from these sequences. We selected the optimal features by applying correlation-based and random forest feature selection techniques. Finally, we designed an ensemble gradient boosting model (GBM) to predict putative ABPs. We evaluated our model using ROC curves, calculating the area under the curve (AUC) for several different models for comparison, including a recurrent neural network, a support vector machine, and iAMPpred. The AUC for the GBM was0.98, more than 2.5% better than any of the other models. We also present an algorithm which artificially generates potential ABPs based on the frequency of amino acid occurrence in more than 3000 ABPs and their frequency of length. The algorithm uses a random function to produce sets of amino acid sequences in which the probability of inclusion of amino acids is based on the calculated frequencies. After generating the artificial sequences, we use the GBM to predict whether they are ABPs.

Keywords: Antibacterial peptides; ensemble gradient boosting; drug discovery