PBertKla: a protein large language model for predicting human lysine lactylation sites.

IF 4.4 1区 生物学 Q1 BIOLOGY
Hongyan Lai, Diyu Luo, Mi Yang, Tao Zhu, Huan Yang, Xinwei Luo, Yijie Wei, Sijia Xie, Feitong Hong, Kunxian Shu, Fuying Dao, Hui Ding
{"title":"PBertKla: a protein large language model for predicting human lysine lactylation sites.","authors":"Hongyan Lai, Diyu Luo, Mi Yang, Tao Zhu, Huan Yang, Xinwei Luo, Yijie Wei, Sijia Xie, Feitong Hong, Kunxian Shu, Fuying Dao, Hui Ding","doi":"10.1186/s12915-025-02202-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lactylation is a newly discovered type of post-translational modification, primarily occurring on lysine (K) residues of both histones and non-histones to exert diverse effects on target proteins. Research has shown that lysine lactylation (Kla) modification is ubiquitous in different cells and participates in the determination of cell function and fate, as well as in the initiation and progression of various diseases. Precise identification of Kla sites is fundamental for elucidating their biological functions and uncovering their application potential.</p><p><strong>Results: </strong>Here, we proposed a novel human Kla site predictor (named PBertKla) through curating a reliable benchmark dataset with proper sample length and sequence identity threshold to train a protein large language model with optimal hyperparameters. Extensive experimental results consistently demonstrated that our model possessed robust human Kla site prediction ability, achieving an AUC (area under receiver operating characteristic curve) value of over 0.880 on the independent validation data. Feature visualization analysis further validated the effectiveness of in feature learning and representation from Kla sequences. Moreover, we benchmarked PBertKla against other cutting-edge models on an independent testing dataset from different sources, highlighting its superiority and transferability.</p><p><strong>Conclusions: </strong>All results indicated that PBertKla excelled as an automatic predictor of human Kla sites, and it would advance the investigation of lactylation modifications and their significance in health and disease.</p>","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"23 1","pages":"95"},"PeriodicalIF":4.4000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11974188/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-025-02202-1","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Lactylation is a newly discovered type of post-translational modification, primarily occurring on lysine (K) residues of both histones and non-histones to exert diverse effects on target proteins. Research has shown that lysine lactylation (Kla) modification is ubiquitous in different cells and participates in the determination of cell function and fate, as well as in the initiation and progression of various diseases. Precise identification of Kla sites is fundamental for elucidating their biological functions and uncovering their application potential.

Results: Here, we proposed a novel human Kla site predictor (named PBertKla) through curating a reliable benchmark dataset with proper sample length and sequence identity threshold to train a protein large language model with optimal hyperparameters. Extensive experimental results consistently demonstrated that our model possessed robust human Kla site prediction ability, achieving an AUC (area under receiver operating characteristic curve) value of over 0.880 on the independent validation data. Feature visualization analysis further validated the effectiveness of in feature learning and representation from Kla sequences. Moreover, we benchmarked PBertKla against other cutting-edge models on an independent testing dataset from different sources, highlighting its superiority and transferability.

Conclusions: All results indicated that PBertKla excelled as an automatic predictor of human Kla sites, and it would advance the investigation of lactylation modifications and their significance in health and disease.

PBertKla:预测人类赖氨酸乳酸化位点的蛋白质大语言模型。
背景:乳酸化修饰是一种新发现的翻译后修饰类型,主要发生在组蛋白和非组蛋白的赖氨酸(K)残基上,对靶蛋白产生不同的作用。研究表明,赖氨酸乳酸化修饰(Kla)在不同细胞中普遍存在,并参与决定细胞的功能和命运,以及各种疾病的发生和发展。准确鉴定Kla位点是阐明其生物学功能和揭示其应用潜力的基础。在这里,我们提出了一个新的人类Kla位点预测器(命名为PBertKla),通过管理一个可靠的基准数据集,适当的样本长度和序列识别阈值来训练具有最优超参数的蛋白质大语言模型。大量的实验结果一致表明,我们的模型具有强大的人类Kla位点预测能力,在独立验证数据上实现了超过0.880的AUC(接收者工作特征曲线下面积)值。特征可视化分析进一步验证了该方法在Kla序列特征学习和表征方面的有效性。此外,我们在不同来源的独立测试数据集上对PBertKla与其他前沿模型进行了基准测试,突出了其优越性和可移植性。结论:所有结果表明PBertKla是人类Kla位点的自动预测因子,它将促进对乳酸化修饰及其在健康和疾病中的意义的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Biology
BMC Biology 生物-生物学
CiteScore
7.80
自引率
1.90%
发文量
260
审稿时长
3 months
期刊介绍: BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信