PerseuCPP: a machine learning strategy to predict cell-penetrating peptides and their uptake efficiency.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-09-08 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf213
Rayane Monique Bernardes-Loch, Gustavo de Oliveira Almeida, Igor Teixeira Brasiliano, Wagner Meira, Douglas E V Pires, Maria Cristina Baracat-Pereira, Sabrina de Azevedo Silveira
{"title":"PerseuCPP: a machine learning strategy to predict cell-penetrating peptides and their uptake efficiency.","authors":"Rayane Monique Bernardes-Loch, Gustavo de Oliveira Almeida, Igor Teixeira Brasiliano, Wagner Meira, Douglas E V Pires, Maria Cristina Baracat-Pereira, Sabrina de Azevedo Silveira","doi":"10.1093/bioadv/vbaf213","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Cell-penetrating peptides (CPPs) are promising tools for transporting therapeutic molecules into cells without damaging the cellular membrane. These peptides serve as efficient drug delivery systems, capable of carrying diverse biologically active substances while exhibiting low cytotoxicity compared to non-native molecules. However, identifying CPPs through experimental methods is expensive and time-consuming, making computational strategies an attractive alternative due to their cost-effectiveness and scalability.</p><p><strong>Results: </strong>This study introduces PerseuCPP, a machine learning strategy designed to identify CPPs. Based on descriptors including physicochemical and structural properties as well as atomic composition, our strategy employs the Extremely Randomized Trees to predict CPPs and their uptake efficiency. The first stage was developed using a balanced dataset of 967 CPPs and non-CPPs, applying a 10-fold cross-validation scheme. Two independent datasets were utilized for validation. The CPP predictor achieved superior results compared to state-of-the-art methods, with MCC 0.854, Recall 0.860, and AUC 0.984. The second stage, focused on efficiency prediction, was trained on a balanced dataset of 140 CPPs and non-CPPs, also using a 10-fold cross-validation scheme, and validated with an independent dataset. The efficiency predictor achieved competitive results, with Recall 0.761 and AUC 0.690. PerseuCPP is interpretable, offering insights into the key descriptors enabling peptides to penetrate cells effectively. We anticipate that PerseuCPP will be a valuable tool for advancing the design and application of CPPs in drug delivery and biomedical research.</p><p><strong>Availability and implementation: </strong>https://github.com/goalmeida05/PERSEU.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf213"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462384/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Cell-penetrating peptides (CPPs) are promising tools for transporting therapeutic molecules into cells without damaging the cellular membrane. These peptides serve as efficient drug delivery systems, capable of carrying diverse biologically active substances while exhibiting low cytotoxicity compared to non-native molecules. However, identifying CPPs through experimental methods is expensive and time-consuming, making computational strategies an attractive alternative due to their cost-effectiveness and scalability.

Results: This study introduces PerseuCPP, a machine learning strategy designed to identify CPPs. Based on descriptors including physicochemical and structural properties as well as atomic composition, our strategy employs the Extremely Randomized Trees to predict CPPs and their uptake efficiency. The first stage was developed using a balanced dataset of 967 CPPs and non-CPPs, applying a 10-fold cross-validation scheme. Two independent datasets were utilized for validation. The CPP predictor achieved superior results compared to state-of-the-art methods, with MCC 0.854, Recall 0.860, and AUC 0.984. The second stage, focused on efficiency prediction, was trained on a balanced dataset of 140 CPPs and non-CPPs, also using a 10-fold cross-validation scheme, and validated with an independent dataset. The efficiency predictor achieved competitive results, with Recall 0.761 and AUC 0.690. PerseuCPP is interpretable, offering insights into the key descriptors enabling peptides to penetrate cells effectively. We anticipate that PerseuCPP will be a valuable tool for advancing the design and application of CPPs in drug delivery and biomedical research.

Availability and implementation: https://github.com/goalmeida05/PERSEU.

PerseuCPP:一种预测细胞穿透肽及其摄取效率的机器学习策略。
动机:细胞穿透肽(CPPs)是一种很有前途的工具,可以在不破坏细胞膜的情况下将治疗分子运送到细胞中。这些肽作为有效的药物传递系统,能够携带多种生物活性物质,同时与非天然分子相比表现出较低的细胞毒性。然而,通过实验方法识别CPPs是昂贵且耗时的,由于其成本效益和可扩展性,使计算策略成为一种有吸引力的替代方案。结果:本研究引入了PerseuCPP,一种旨在识别CPPs的机器学习策略。基于包括物理化学和结构性质以及原子组成在内的描述符,我们的策略采用极端随机树来预测CPPs及其吸收效率。第一阶段使用967个CPPs和非CPPs的平衡数据集开发,应用10倍交叉验证方案。使用两个独立的数据集进行验证。与最先进的方法相比,CPP预测器取得了更好的结果,MCC为0.854,召回率为0.860,AUC为0.984。第二阶段侧重于效率预测,在140个CPPs和非CPPs的平衡数据集上进行训练,同样使用10倍交叉验证方案,并使用独立数据集进行验证。效率预测器取得了竞争性结果,召回率为0.761,AUC为0.690。PerseuCPP是可解释的,提供了对关键描述符的见解,使肽能够有效地穿透细胞。我们预计,PerseuCPP将成为推进CPPs在药物输送和生物医学研究中的设计和应用的宝贵工具。可用性和实现:https://github.com/goalmeida05/PERSEU。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信