Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning.

Shaun M Kandathil, Andy M Lau, Daniel W A Buchan, David T Jones
{"title":"Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning.","authors":"Shaun M Kandathil, Andy M Lau, Daniel W A Buchan, David T Jones","doi":"10.1093/bioinformatics/btaf277","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The availability of very large numbers of protein structures from accurate computational methods poses new challenges in storing, searching and detecting relationships between these structures. In particular, the new-found abundance of multi-domain structures in the AlphaFold structure database introduces challenges for traditional structure comparison methods.</p><p><strong>Results: </strong>We address these challenges using a fast, embedding-based structure comparison method called Foldclass which detects structural similarity between protein domains. We demonstrate the accuracy of Foldclass embeddings for homology detection. In combination with a recently developed deep learning-based automatic domain segmentation tool Merizo, we develop Merizo-search, which first segments multi-domain query structures into domains, and then searches a Foldclass embedding database to determine the top matches for each constituent domain. Combining the ability of Merizo to accurately segment complete chains into domains, and Foldclass to embed and detect similar domains, the Merizo-search tool can be used to rapidly detect per-domain similarities for complete chains, taking as little as 2 min to search all 365 million domains from the Encyclopedia of Domains. We anticipate that these tools will enable many analyses using the wealth of predicted structural data now available.</p><p><strong>Availability and implementation: </strong>Foldclass and Merizo-search are available at https://github.com/psipred/merizo_search. The version used in this publication is archived at https://doi.org/10.5281/zenodo.15120830. Merizo-search is also available on the PSIPRED web server at http://bioinf.cs.ucl.ac.uk/psipred.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: The availability of very large numbers of protein structures from accurate computational methods poses new challenges in storing, searching and detecting relationships between these structures. In particular, the new-found abundance of multi-domain structures in the AlphaFold structure database introduces challenges for traditional structure comparison methods.

Results: We address these challenges using a fast, embedding-based structure comparison method called Foldclass which detects structural similarity between protein domains. We demonstrate the accuracy of Foldclass embeddings for homology detection. In combination with a recently developed deep learning-based automatic domain segmentation tool Merizo, we develop Merizo-search, which first segments multi-domain query structures into domains, and then searches a Foldclass embedding database to determine the top matches for each constituent domain. Combining the ability of Merizo to accurately segment complete chains into domains, and Foldclass to embed and detect similar domains, the Merizo-search tool can be used to rapidly detect per-domain similarities for complete chains, taking as little as 2 min to search all 365 million domains from the Encyclopedia of Domains. We anticipate that these tools will enable many analyses using the wealth of predicted structural data now available.

Availability and implementation: Foldclass and Merizo-search are available at https://github.com/psipred/merizo_search. The version used in this publication is archived at https://doi.org/10.5281/zenodo.15120830. Merizo-search is also available on the PSIPRED web server at http://bioinf.cs.ucl.ac.uk/psipred.

折叠类和merizo搜索:使用几何学习对单域和多域蛋白质进行可扩展的结构相似性搜索。
动机:通过精确的计算方法获得大量的蛋白质结构,在存储、搜索和检测这些结构之间的关系方面提出了新的挑战。特别是在AlphaFold结构数据库中新发现的丰富的多域结构,对传统的结构比较方法提出了挑战。结果:我们使用一种快速的、基于嵌入的结构比较方法来解决这些挑战,这种方法称为Foldclass,可以检测蛋白质结构域之间的结构相似性。我们证明了Foldclass嵌入用于同源性检测的准确性。结合最近开发的基于深度学习的自动领域分割工具Merizo,我们开发了Merizo-search,它首先将多域查询结构分割成域,然后搜索Foldclass嵌入数据库以确定每个组成域的顶级匹配。结合Merizo精确地将完整链分割成域的能力,以及Foldclass嵌入和检测相似域的能力,Merizo-search工具可以用于快速检测完整链的每个域的相似性,只需2分钟即可从百科全书中搜索所有3.65亿个域。我们预计,这些工具将使许多分析使用丰富的预测结构数据现在可用。可用性:Foldclass和Merizo-search可在https://github.com/psipred/merizo_search上获得。本出版物中使用的版本存档于https://doi.org/10.5281/zenodo.15120830。Merizo-search也可在PSIPRED web服务器上获得:http://bioinf.cs.ucl.ac.uk/psipred.Supplementary information:补充数据可在Bioinformatics在线获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信