Cell-penetrating peptides (CPPs) are short peptides that can penetrate cell membranes, making them valuable for drug delivery and targeting the inside of cells. Predicting CPPs accurately can streamline experimental validation in the lab. This study aims to assess pretrained protein language models (pLMs) for their effectiveness in representing CPPs and to develop a reliable model for CPP classification. We evaluated the performance of several PLMs including BEPLER, CPCProt, SeqVec, different variants of ESM (ESM, ESM-2 with expanded feature set, ESM-1b, and ESM-1v), ProtT5-Port-BERT, ProtT5-XL-UniRef50, and ProtT5-XL-BFD. We developed pLM4CCPs, a novel deep learning architecture using CNNs as the classifier for binary classification of CPPs. pLM4CCPs demonstrated superior performance over the existing state-of-the-art model for CPP prediction. Specifically, pLM4CCPs achieved improvements in ACC by 4.9%-5.5%, MCC by 9.3%-10.2%, and Sn by 14.1%-19.6%. Among these models, ESM-1280 and ProtT5-XL-BFD demonstrated the highest overall performance on the KELM dataset. ESM-1280 achieved an ACC of 0.896, an MCC of 0.796, a sensitivity (Sn) of 0.844, and a specificity (Sp) of 0.978. Similarly, ProtT5-XL-BFD exhibited superior performance, with an ACC of 0.901, an MCC of 0.802, a Sn of 0.885, and an Sp of 0.917, making both models noteworthy for CPP prediction. pLM4CCPs combines predictions from multiple models to provide a consensus on whether a given peptide sequence is classified as a CPP or non-CPP. This ensemble approach enhances prediction reliability by leveraging the strengths of each individual model. A user-friendly web server for bioactivity predictions, along with a source code, datasets, and templates for adapting pLM4CCPs to other tasks, will be accessible on GitHub. This platform aims to advance CPP prediction and peptide functionality modeling, aiding researchers in exploring peptide functionality effectively.
Previous Article in event
Previous Article in session
Next Article in event
Next Article in session
pLM4CPPs: Protein Language Model-Based Predictor for Cell-Penetrating Peptides
Published:
25 October 2024
by MDPI
in The 5th International Electronic Conference on Foods
session Application of Artificial Intelligence (AI) and Machine Learning in The Food Industry
Abstract:
Keywords: Keywords: Cell Penetrating Peptides; Pretrained Protein Language Models; Convolutional Neural Networks; Bioactivity Prediction; Transfer Learning