Rayane Monique Bernardes-Loch, Gustavo de Oliveira Almeida, Igor Teixeira Brasiliano, Wagner Meira, Douglas E V Pires, Maria Cristina Baracat-Pereira, Sabrina de Azevedo Silveira
{"title":"PerseuCPP: a machine learning strategy to predict cell-penetrating peptides and their uptake efficiency.","authors":"Rayane Monique Bernardes-Loch, Gustavo de Oliveira Almeida, Igor Teixeira Brasiliano, Wagner Meira, Douglas E V Pires, Maria Cristina Baracat-Pereira, Sabrina de Azevedo Silveira","doi":"10.1093/bioadv/vbaf213","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Cell-penetrating peptides (CPPs) are promising tools for transporting therapeutic molecules into cells without damaging the cellular membrane. These peptides serve as efficient drug delivery systems, capable of carrying diverse biologically active substances while exhibiting low cytotoxicity compared to non-native molecules. However, identifying CPPs through experimental methods is expensive and time-consuming, making computational strategies an attractive alternative due to their cost-effectiveness and scalability.</p><p><strong>Results: </strong>This study introduces PerseuCPP, a machine learning strategy designed to identify CPPs. Based on descriptors including physicochemical and structural properties as well as atomic composition, our strategy employs the Extremely Randomized Trees to predict CPPs and their uptake efficiency. The first stage was developed using a balanced dataset of 967 CPPs and non-CPPs, applying a 10-fold cross-validation scheme. Two independent datasets were utilized for validation. The CPP predictor achieved superior results compared to state-of-the-art methods, with MCC 0.854, Recall 0.860, and AUC 0.984. The second stage, focused on efficiency prediction, was trained on a balanced dataset of 140 CPPs and non-CPPs, also using a 10-fold cross-validation scheme, and validated with an independent dataset. The efficiency predictor achieved competitive results, with Recall 0.761 and AUC 0.690. PerseuCPP is interpretable, offering insights into the key descriptors enabling peptides to penetrate cells effectively. We anticipate that PerseuCPP will be a valuable tool for advancing the design and application of CPPs in drug delivery and biomedical research.</p><p><strong>Availability and implementation: </strong>https://github.com/goalmeida05/PERSEU.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf213"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462384/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Cell-penetrating peptides (CPPs) are promising tools for transporting therapeutic molecules into cells without damaging the cellular membrane. These peptides serve as efficient drug delivery systems, capable of carrying diverse biologically active substances while exhibiting low cytotoxicity compared to non-native molecules. However, identifying CPPs through experimental methods is expensive and time-consuming, making computational strategies an attractive alternative due to their cost-effectiveness and scalability.
Results: This study introduces PerseuCPP, a machine learning strategy designed to identify CPPs. Based on descriptors including physicochemical and structural properties as well as atomic composition, our strategy employs the Extremely Randomized Trees to predict CPPs and their uptake efficiency. The first stage was developed using a balanced dataset of 967 CPPs and non-CPPs, applying a 10-fold cross-validation scheme. Two independent datasets were utilized for validation. The CPP predictor achieved superior results compared to state-of-the-art methods, with MCC 0.854, Recall 0.860, and AUC 0.984. The second stage, focused on efficiency prediction, was trained on a balanced dataset of 140 CPPs and non-CPPs, also using a 10-fold cross-validation scheme, and validated with an independent dataset. The efficiency predictor achieved competitive results, with Recall 0.761 and AUC 0.690. PerseuCPP is interpretable, offering insights into the key descriptors enabling peptides to penetrate cells effectively. We anticipate that PerseuCPP will be a valuable tool for advancing the design and application of CPPs in drug delivery and biomedical research.
Availability and implementation: https://github.com/goalmeida05/PERSEU.