TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Bingqing Han, Yipeng Zhang, Longlong Li, Xinqi Gong, Kelin Xia
{"title":"TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment.","authors":"Bingqing Han, Yipeng Zhang, Longlong Li, Xinqi Gong, Kelin Xia","doi":"10.1093/bib/bbaf083","DOIUrl":null,"url":null,"abstract":"<p><p>Even with the significant advances of AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3) in protein complex structure prediction, their accuracy is still not comparable with monomer structure prediction. Efficient and effective quality assessment (QA) or estimation of model accuracy models that can evaluate the quality of the predicted protein-complexes without knowing their native structures are of key importance for protein structure generation and model selection. In this paper, we leverage persistent homology (PH) to capture the atomic-level topological information around residues and design a topological deep learning-based QA method, TopoQA, to assess the accuracy of protein complex interfaces. We integrate PH from topological data analysis into graph neural networks (GNNs) to characterize complex higher-order structures that GNNs might overlook, enhancing the learning of the relationship between the topological structure of complex interfaces and quality scores. Our TopoQA model is extensively validated based on the two most-widely used benchmark datasets, Docking Benchmark5.5 AF2 (DBM55-AF2) and Heterodimer-AF2 (HAF2), along with our newly constructed ABAG-AF3 dataset to facilitate comparisons with AF3. For all three datasets, TopoQA outperforms AF-Multimer-based AF2Rank and shows an advantage over AF3 in nearly half of the targets. In particular, in the DBM55-AF2 dataset, a ranking loss of 73.6% lower than AF-Multimer-based AF2Rank is obtained. Further, other than AF-Multimer and AF3, we have also extensively compared with nearly-all the state-of-the-art models (as far as we know), it has been found that our TopoQA can achieve the highest Top 10 Hit-rate on the DBM55-AF2 dataset and the lowest ranking loss on the HAF2 dataset. Ablation experiments show that our topological features significantly improve the model's performance. At the same time, our method also provides a new paradigm for protein structure representation learning.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891663/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf083","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Even with the significant advances of AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3) in protein complex structure prediction, their accuracy is still not comparable with monomer structure prediction. Efficient and effective quality assessment (QA) or estimation of model accuracy models that can evaluate the quality of the predicted protein-complexes without knowing their native structures are of key importance for protein structure generation and model selection. In this paper, we leverage persistent homology (PH) to capture the atomic-level topological information around residues and design a topological deep learning-based QA method, TopoQA, to assess the accuracy of protein complex interfaces. We integrate PH from topological data analysis into graph neural networks (GNNs) to characterize complex higher-order structures that GNNs might overlook, enhancing the learning of the relationship between the topological structure of complex interfaces and quality scores. Our TopoQA model is extensively validated based on the two most-widely used benchmark datasets, Docking Benchmark5.5 AF2 (DBM55-AF2) and Heterodimer-AF2 (HAF2), along with our newly constructed ABAG-AF3 dataset to facilitate comparisons with AF3. For all three datasets, TopoQA outperforms AF-Multimer-based AF2Rank and shows an advantage over AF3 in nearly half of the targets. In particular, in the DBM55-AF2 dataset, a ranking loss of 73.6% lower than AF-Multimer-based AF2Rank is obtained. Further, other than AF-Multimer and AF3, we have also extensively compared with nearly-all the state-of-the-art models (as far as we know), it has been found that our TopoQA can achieve the highest Top 10 Hit-rate on the DBM55-AF2 dataset and the lowest ranking loss on the HAF2 dataset. Ablation experiments show that our topological features significantly improve the model's performance. At the same time, our method also provides a new paradigm for protein structure representation learning.

即使 AlphaFold-Multimer(AF-Multimer)和 AlphaFold3(AF3)在蛋白质复合物结构预测方面取得了重大进展,其准确性仍然无法与单体结构预测相提并论。高效、有效的质量评估(QA)或模型准确性估算模型可以在不知道蛋白质原生结构的情况下评估所预测的蛋白质复合物的质量,这对于蛋白质结构生成和模型选择至关重要。在本文中,我们利用持久同源性(PH)来捕捉残基周围的原子级拓扑信息,并设计了一种基于拓扑深度学习的 QA 方法 TopoQA 来评估蛋白质复合物界面的准确性。我们将拓扑数据分析中的 PH 整合到图神经网络(GNN)中,以表征 GNN 可能会忽略的复杂高阶结构,从而加强对复杂界面的拓扑结构与质量分数之间关系的学习。我们的 TopoQA 模型基于两个最广泛使用的基准数据集(Docking Benchmark5.5 AF2 (DBM55-AF2) 和 Heterodimer-AF2 (HAF2))以及我们新构建的 ABAG-AF3 数据集进行了广泛的验证,以方便与 AF3 进行比较。在所有三个数据集中,TopoQA 的表现都优于基于 AF 多聚体的 AF2Rank,并在近一半的目标上显示出优于 AF3 的优势。特别是在 DBM55-AF2 数据集中,排名损失比基于 AF-Multimer 的 AF2Rank 低 73.6%。此外,除了 AF-Multimer 和 AF3 之外,我们还与几乎所有最先进的模型(据我们所知)进行了广泛的比较,发现我们的 TopoQA 在 DBM55-AF2 数据集上可以达到最高的 Top 10 命中率,在 HAF2 数据集上可以达到最低的排名损失。消融实验表明,我们的拓扑特征显著提高了模型的性能。同时,我们的方法也为蛋白质结构表征学习提供了一种新的范式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信