Please login first
pLM4CPPs: Protein Language Model-Based Predictor for Cell-Penetrating Peptides
, , *
1  Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
Academic Editor: Moktar Hamdi

Abstract:

Cell-penetrating peptides (CPPs) are short peptides that can penetrate cell membranes, making them valuable for drug delivery and targeting the inside of cells. Predicting CPPs accurately can streamline experimental validation in the lab. This study aims to assess pretrained protein language models (pLMs) for their effectiveness in representing CPPs and to develop a reliable model for CPP classification. We evaluated the performance of several PLMs including BEPLER, CPCProt, SeqVec, different variants of ESM (ESM, ESM-2 with expanded feature set, ESM-1b, and ESM-1v), ProtT5-Port-BERT, ProtT5-XL-UniRef50, and ProtT5-XL-BFD. We developed pLM4CCPs, a novel deep learning architecture using CNNs as the classifier for binary classification of CPPs. pLM4CCPs demonstrated superior performance over the existing state-of-the-art model for CPP prediction. Specifically, pLM4CCPs achieved improvements in ACC by 4.9%-5.5%, MCC by 9.3%-10.2%, and Sn by 14.1%-19.6%. Among these models, ESM-1280 and ProtT5-XL-BFD demonstrated the highest overall performance on the KELM dataset. ESM-1280 achieved an ACC of 0.896, an MCC of 0.796, a sensitivity (Sn) of 0.844, and a specificity (Sp) of 0.978. Similarly, ProtT5-XL-BFD exhibited superior performance, with an ACC of 0.901, an MCC of 0.802, a Sn of 0.885, and an Sp of 0.917, making both models noteworthy for CPP prediction. pLM4CCPs combines predictions from multiple models to provide a consensus on whether a given peptide sequence is classified as a CPP or non-CPP. This ensemble approach enhances prediction reliability by leveraging the strengths of each individual model. A user-friendly web server for bioactivity predictions, along with a source code, datasets, and templates for adapting pLM4CCPs to other tasks, will be accessible on GitHub. This platform aims to advance CPP prediction and peptide functionality modeling, aiding researchers in exploring peptide functionality effectively.

Keywords: Keywords: Cell Penetrating Peptides; Pretrained Protein Language Models; Convolutional Neural Networks; Bioactivity Prediction; Transfer Learning
Top