ProAttUnet: Advancing protein secondary structure prediction with deep learning via U-Net dual-pathway feature fusion and ESM2 pretrained protein language model

IF 2.6 4区 生物学 Q2 BIOLOGY
Long Cheng , Weizhong Lu , Yiyi Xia , Yiming Lu , Jiyun Shen , Zhiqiang Hui , Yixin Xu , Hongjie Wu , Jing Chen , Qiming Fu , You Lu
{"title":"ProAttUnet: Advancing protein secondary structure prediction with deep learning via U-Net dual-pathway feature fusion and ESM2 pretrained protein language model","authors":"Long Cheng ,&nbsp;Weizhong Lu ,&nbsp;Yiyi Xia ,&nbsp;Yiming Lu ,&nbsp;Jiyun Shen ,&nbsp;Zhiqiang Hui ,&nbsp;Yixin Xu ,&nbsp;Hongjie Wu ,&nbsp;Jing Chen ,&nbsp;Qiming Fu ,&nbsp;You Lu","doi":"10.1016/j.compbiolchem.2025.108429","DOIUrl":null,"url":null,"abstract":"<div><div>Protein secondary structure prediction remains a pivotal concern within the domain of bioinformatics. In this innovative research, we introduce a novel methodology to further enhance a protein prediction model grounded in single sequences. Our key contribution lies in integrating the state-of-the-art (SOTA) model ESM2, which hails from the field of universal protein language models. By leveraging ESM2, we are able to acquire residual embeddings and contact maps for the protein sequences under study. Regarding the model architecture, we employ a unique dual-way U-Net framework for effective feature fusion. This framework is complemented by the integration of a cross-attention mechanism, enabling the model to capture more comprehensive context information. Furthermore, In accordance with the distinctive characteristics of protein sequences, we incorporate a so-called GCU_SE module into both the encoder and the decoder components of the model. These innovative enhancements enable the ProAttUnet model to outperform the benchmark model SPOT-1D-Single by 1.6%, 3.5%, 1.0%, 4.6%, and 7.2% for ss3, and by 5.5%, 7.8%, 4.1%, 8.1%, and 10.1% for ss8 across five test sets (SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ and TEST2018, respectively). This significant improvement vividly demonstrates the effectiveness and novelty of our proposed model.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"118 ","pages":"Article 108429"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125000891","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Protein secondary structure prediction remains a pivotal concern within the domain of bioinformatics. In this innovative research, we introduce a novel methodology to further enhance a protein prediction model grounded in single sequences. Our key contribution lies in integrating the state-of-the-art (SOTA) model ESM2, which hails from the field of universal protein language models. By leveraging ESM2, we are able to acquire residual embeddings and contact maps for the protein sequences under study. Regarding the model architecture, we employ a unique dual-way U-Net framework for effective feature fusion. This framework is complemented by the integration of a cross-attention mechanism, enabling the model to capture more comprehensive context information. Furthermore, In accordance with the distinctive characteristics of protein sequences, we incorporate a so-called GCU_SE module into both the encoder and the decoder components of the model. These innovative enhancements enable the ProAttUnet model to outperform the benchmark model SPOT-1D-Single by 1.6%, 3.5%, 1.0%, 4.6%, and 7.2% for ss3, and by 5.5%, 7.8%, 4.1%, 8.1%, and 10.1% for ss8 across five test sets (SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ and TEST2018, respectively). This significant improvement vividly demonstrates the effectiveness and novelty of our proposed model.

Abstract Image

ProAttUnet:通过U-Net双通路特征融合和ESM2预训练蛋白语言模型,推进深度学习蛋白二级结构预测
蛋白质二级结构预测仍然是生物信息学领域的一个关键问题。在这项创新的研究中,我们引入了一种新的方法来进一步增强基于单序列的蛋白质预测模型。我们的主要贡献在于整合了最先进的(SOTA)模型ESM2,该模型来自通用蛋白质语言模型领域。通过利用ESM2,我们能够获得所研究的蛋白质序列的剩余嵌入和接触图。在模型架构方面,我们采用独特的双向U-Net框架进行有效的特征融合。该框架由交叉注意机制的集成补充,使模型能够捕获更全面的上下文信息。此外,根据蛋白质序列的独特特征,我们在模型的编码器和解码器组件中都加入了一个所谓的GCU_SE模块。这些创新的增强使ProAttUnet模型在5个测试集(SPOT-2016、SPOT-2016- hq、SPOT-2018、SPOT-2018- hq和TEST2018)上的性能分别优于基准模型spot - d1 - single的1.6%、3.5%、1.0%、4.6%和7.2%,优于基准模型spot - d1 - single的5.5%、7.8%、4.1%、8.1%和10.1%。这一显著的改进生动地证明了我们提出的模型的有效性和新颖性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational Biology and Chemistry
Computational Biology and Chemistry 生物-计算机:跨学科应用
CiteScore
6.10
自引率
3.20%
发文量
142
审稿时长
24 days
期刊介绍: Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信