ProAttUnet: Advancing protein secondary structure prediction with deep learning via U-Net dual-pathway feature fusion and ESM2 pretrained protein language model

IF 2.6 4区生物学 Q2 BIOLOGY

Computational Biology and Chemistry Pub Date : 2025-04-21 DOI:10.1016/j.compbiolchem.2025.108429

Long Cheng , Weizhong Lu , Yiyi Xia , Yiming Lu , Jiyun Shen , Zhiqiang Hui , Yixin Xu , Hongjie Wu , Jing Chen , Qiming Fu , You Lu

{"title":"ProAttUnet: Advancing protein secondary structure prediction with deep learning via U-Net dual-pathway feature fusion and ESM2 pretrained protein language model","authors":"Long Cheng , Weizhong Lu , Yiyi Xia , Yiming Lu , Jiyun Shen , Zhiqiang Hui , Yixin Xu , Hongjie Wu , Jing Chen , Qiming Fu , You Lu","doi":"10.1016/j.compbiolchem.2025.108429","DOIUrl":null,"url":null,"abstract":"<div><div>Protein secondary structure prediction remains a pivotal concern within the domain of bioinformatics. In this innovative research, we introduce a novel methodology to further enhance a protein prediction model grounded in single sequences. Our key contribution lies in integrating the state-of-the-art (SOTA) model ESM2, which hails from the field of universal protein language models. By leveraging ESM2, we are able to acquire residual embeddings and contact maps for the protein sequences under study. Regarding the model architecture, we employ a unique dual-way U-Net framework for effective feature fusion. This framework is complemented by the integration of a cross-attention mechanism, enabling the model to capture more comprehensive context information. Furthermore, In accordance with the distinctive characteristics of protein sequences, we incorporate a so-called GCU_SE module into both the encoder and the decoder components of the model. These innovative enhancements enable the ProAttUnet model to outperform the benchmark model SPOT-1D-Single by 1.6%, 3.5%, 1.0%, 4.6%, and 7.2% for ss3, and by 5.5%, 7.8%, 4.1%, 8.1%, and 10.1% for ss8 across five test sets (SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ and TEST2018, respectively). This significant improvement vividly demonstrates the effectiveness and novelty of our proposed model.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"118 ","pages":"Article 108429"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125000891","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Protein secondary structure prediction remains a pivotal concern within the domain of bioinformatics. In this innovative research, we introduce a novel methodology to further enhance a protein prediction model grounded in single sequences. Our key contribution lies in integrating the state-of-the-art (SOTA) model ESM2, which hails from the field of universal protein language models. By leveraging ESM2, we are able to acquire residual embeddings and contact maps for the protein sequences under study. Regarding the model architecture, we employ a unique dual-way U-Net framework for effective feature fusion. This framework is complemented by the integration of a cross-attention mechanism, enabling the model to capture more comprehensive context information. Furthermore, In accordance with the distinctive characteristics of protein sequences, we incorporate a so-called GCU_SE module into both the encoder and the decoder components of the model. These innovative enhancements enable the ProAttUnet model to outperform the benchmark model SPOT-1D-Single by 1.6%, 3.5%, 1.0%, 4.6%, and 7.2% for ss3, and by 5.5%, 7.8%, 4.1%, 8.1%, and 10.1% for ss8 across five test sets (SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ and TEST2018, respectively). This significant improvement vividly demonstrates the effectiveness and novelty of our proposed model.

Abstract Image

查看原文本刊更多论文

ProAttUnet：通过U-Net双通路特征融合和ESM2预训练蛋白语言模型，推进深度学习蛋白二级结构预测

蛋白质二级结构预测仍然是生物信息学领域的一个关键问题。在这项创新的研究中，我们引入了一种新的方法来进一步增强基于单序列的蛋白质预测模型。我们的主要贡献在于整合了最先进的（SOTA）模型ESM2，该模型来自通用蛋白质语言模型领域。通过利用ESM2，我们能够获得所研究的蛋白质序列的剩余嵌入和接触图。在模型架构方面，我们采用独特的双向U-Net框架进行有效的特征融合。该框架由交叉注意机制的集成补充，使模型能够捕获更全面的上下文信息。此外，根据蛋白质序列的独特特征，我们在模型的编码器和解码器组件中都加入了一个所谓的GCU_SE模块。这些创新的增强使ProAttUnet模型在5个测试集（SPOT-2016、SPOT-2016- hq、SPOT-2018、SPOT-2018- hq和TEST2018）上的性能分别优于基准模型spot - d1 - single的1.6%、3.5%、1.0%、4.6%和7.2%，优于基准模型spot - d1 - single的5.5%、7.8%、4.1%、8.1%和10.1%。这一显著的改进生动地证明了我们提出的模型的有效性和新颖性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.