BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction.

IF 4.4 1区 生物学 Q1 BIOLOGY
Xueqin Xie, Changchun Wu, Yixuan Qi, Shanghua Liu, Jian Huang, Hao Lyu, Fuying Dao, Hao Lin
{"title":"BertADP: a fine-tuned protein language model for anti-diabetic peptide prediction.","authors":"Xueqin Xie, Changchun Wu, Yixuan Qi, Shanghua Liu, Jian Huang, Hao Lyu, Fuying Dao, Hao Lin","doi":"10.1186/s12915-025-02312-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Diabetes is a global metabolic disease that urgently calls for the development of new and effective therapeutic agents. Anti-diabetic peptides (ADPs) have emerged as a research hotspot due to their therapeutic potential and natural safety, representing a promising class of functional peptides for diabetic management. However, conventional computational approaches for ADPs prediction mainly rely on manually extracted sequence features. These methods often lack generalizability and perform poorly on short peptides, thereby hindering effective ADPs discovery.</p><p><strong>Results: </strong>In this study, we introduce a fine-tuning strategy of large-scale pre-trained protein language models (PLMs) for ADPs prediction, enabling automated extraction of discriminative semantic representations. We established the most comprehensive ADPs dataset to date, comprising 899 rigorously curated non-redundant ADPs and 67 newly collected potential candidates. Based on three model construction strategies, we developed 11 candidate models. Among them, BertADP (a fine-tuned ProtBert model) demonstrated superior performance in the independent test set, outperforming existing ADPs prediction tools with an overall accuracy of 0.955, sensitivity of 1.000, and specificity of 0.910. Notably, BertADP exhibited remarkable sequence length adaptability, maintaining stable performance across both standard and short peptide sequences.</p><p><strong>Conclusions: </strong>BertADP represents the first PLMs-based intelligent prediction tool for ADPs, whose exceptional identification capability will significantly accelerate anti-diabetic drug development and facilitate personalized therapeutic strategies, thereby enhancing precision diabetes management. Furthermore, the proposed approach provides a generalizable framework that can be extended to other bioactive peptide discovery studies, offering an innovative solution for bioactive peptide mining.</p>","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"23 1","pages":"210"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261731/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-025-02312-w","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Diabetes is a global metabolic disease that urgently calls for the development of new and effective therapeutic agents. Anti-diabetic peptides (ADPs) have emerged as a research hotspot due to their therapeutic potential and natural safety, representing a promising class of functional peptides for diabetic management. However, conventional computational approaches for ADPs prediction mainly rely on manually extracted sequence features. These methods often lack generalizability and perform poorly on short peptides, thereby hindering effective ADPs discovery.

Results: In this study, we introduce a fine-tuning strategy of large-scale pre-trained protein language models (PLMs) for ADPs prediction, enabling automated extraction of discriminative semantic representations. We established the most comprehensive ADPs dataset to date, comprising 899 rigorously curated non-redundant ADPs and 67 newly collected potential candidates. Based on three model construction strategies, we developed 11 candidate models. Among them, BertADP (a fine-tuned ProtBert model) demonstrated superior performance in the independent test set, outperforming existing ADPs prediction tools with an overall accuracy of 0.955, sensitivity of 1.000, and specificity of 0.910. Notably, BertADP exhibited remarkable sequence length adaptability, maintaining stable performance across both standard and short peptide sequences.

Conclusions: BertADP represents the first PLMs-based intelligent prediction tool for ADPs, whose exceptional identification capability will significantly accelerate anti-diabetic drug development and facilitate personalized therapeutic strategies, thereby enhancing precision diabetes management. Furthermore, the proposed approach provides a generalizable framework that can be extended to other bioactive peptide discovery studies, offering an innovative solution for bioactive peptide mining.

BertADP:用于抗糖尿病肽预测的微调蛋白质语言模型。
背景:糖尿病是一种全球性的代谢性疾病,迫切需要开发新的有效的治疗药物。抗糖尿病肽(anti - diabetes peptides, ADPs)是一类很有前景的糖尿病治疗功能肽,因其治疗潜力和天然安全性而成为研究热点。然而,传统的ADPs预测方法主要依赖于人工提取序列特征。这些方法往往缺乏通用性,在短肽上表现不佳,从而阻碍了有效的adp的发现。结果:在本研究中,我们引入了一种用于ADPs预测的大规模预训练蛋白质语言模型(PLMs)的微调策略,实现了判别语义表征的自动提取。我们建立了迄今为止最全面的adp数据集,包括899个严格筛选的非冗余adp和67个新收集的潜在候选adp。基于三种模型构建策略,我们开发了11个候选模型。其中,BertADP(一种微调的ProtBert模型)在独立测试集中表现优异,优于现有的ADPs预测工具,总体准确率为0.955,灵敏度为1.000,特异性为0.910。值得注意的是,BertADP表现出显著的序列长度适应性,在标准和短肽序列中都保持稳定的性能。结论:BertADP是首个基于plms的adp智能预测工具,其卓越的识别能力将显著加快抗糖尿病药物的开发,促进个性化治疗策略,从而提高糖尿病的精准管理。此外,所提出的方法提供了一个可推广的框架,可以扩展到其他生物活性肽发现研究,为生物活性肽挖掘提供了一个创新的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Biology
BMC Biology 生物-生物学
CiteScore
7.80
自引率
1.90%
发文量
260
审稿时长
3 months
期刊介绍: BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信