Protein structure alignment by Reseek improves sensitivity to remote homologs.

Robert C Edgar
{"title":"Protein structure alignment by Reseek improves sensitivity to remote homologs.","authors":"Robert C Edgar","doi":"10.1093/bioinformatics/btae687","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.</p><p><strong>Results: </strong>Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a \"mega-alphabet\" of 85 899 345 920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign, and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.</p><p><strong>Availability and implementation: </strong>https://github.com/rcedgar/reseek.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601161/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.

Results: Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a "mega-alphabet" of 85 899 345 920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign, and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.

Availability and implementation: https://github.com/rcedgar/reseek.

通过 reseek 进行蛋白质结构比对可提高对远端同源物的敏感性。
动因:最近在根据氨基酸序列预测蛋白质折叠方面取得了突破性进展,从而产生了大量新结构,为生物信息学带来了新的机遇和挑战:Reseek是一种基于序列比对的新型蛋白质结构比对算法,蛋白质骨架中的每个残基都用一个字母来表示,这个 "巨型字母表 "包含85,899,345,920(∼1011)种不同的状态。与 DALI、TMalign 和 Foldseek 等最先进的方法相比,Reseek 大大提高了对远端同源物的灵敏度,其速度与之前最快的方法 Foldseek 不相上下。我们对扩展到大型人工智能预测折叠数据库的情况进行了分析。结果表明,Foldseek 的 E 值被低估了几个数量级,而 Reseek 的 E 值与测得的误差率十分吻合。可用性:https://github.com/rcedgar/reseek.Supplementary 信息:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信