pLM4CPPs: Protein Language Model-Based Predictor for Cell-Penetrating Peptides

NANDAN KUMAR; Zhenjiao Du; Yonghui Li

Previous Article in event

Knowledge, attitude, and practice of dairy workers about toxoplasmosis and other zoonoses on the farm in the Eastern Cape Province, South Africa

Previous Article in session

EnzyRxn-Transformer: A generative platform for rational experiment design in biotransformation

Next Article in event

Process optimization and characterization of plant-derived omega-3-enriched mozzarella cheese

Next Article in session

Revolutionizing the Food Industry: AI and Machine Learning Applications for Enhanced Efficiency and Sustainability

pLM4CPPs: Protein Language Model-Based Predictor for Cell-Penetrating Peptides

NANDAN KUMAR

Zhenjiao Du

Yonghui Li

¹ Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA

Academic Editor: Moktar Hamdi

Published: 25 October 2024 by MDPI in The 5th International Electronic Conference on Foods session Application of Artificial Intelligence (AI) and Machine Learning in The Food Industry

Abstract:

Cell-penetrating peptides (CPPs) are short peptides that can penetrate cell membranes, making them valuable for drug delivery and targeting the inside of cells. Predicting CPPs accurately can streamline experimental validation in the lab. This study aims to assess pretrained protein language models (pLMs) for their effectiveness in representing CPPs and to develop a reliable model for CPP classification. We evaluated the performance of several PLMs including BEPLER, CPCProt, SeqVec, different variants of ESM (ESM, ESM-2 with expanded feature set, ESM-1b, and ESM-1v), ProtT5-Port-BERT, ProtT5-XL-UniRef50, and ProtT5-XL-BFD. We developed pLM4CCPs, a novel deep learning architecture using CNNs as the classifier for binary classification of CPPs. pLM4CCPs demonstrated superior performance over the existing state-of-the-art model for CPP prediction. Specifically, pLM4CCPs achieved improvements in ACC by 4.9%-5.5%, MCC by 9.3%-10.2%, and Sn by 14.1%-19.6%. Among these models, ESM-1280 and ProtT5-XL-BFD demonstrated the highest overall performance on the KELM dataset. ESM-1280 achieved an ACC of 0.896, an MCC of 0.796, a sensitivity (Sn) of 0.844, and a specificity (Sp) of 0.978. Similarly, ProtT5-XL-BFD exhibited superior performance, with an ACC of 0.901, an MCC of 0.802, a Sn of 0.885, and an Sp of 0.917, making both models noteworthy for CPP prediction. pLM4CCPs combines predictions from multiple models to provide a consensus on whether a given peptide sequence is classified as a CPP or non-CPP. This ensemble approach enhances prediction reliability by leveraging the strengths of each individual model. A user-friendly web server for bioactivity predictions, along with a source code, datasets, and templates for adapting pLM4CCPs to other tasks, will be accessible on GitHub. This platform aims to advance CPP prediction and peptide functionality modeling, aiding researchers in exploring peptide functionality effectively.

Keywords: Keywords: Cell Penetrating Peptides; Pretrained Protein Language Models; Convolutional Neural Networks; Bioactivity Prediction; Transfer Learning

6 Reads
0 Recommendations

NANDAN KUMAR

Zhenjiao Du

Yonghui Li