通过 reseek 进行蛋白质结构比对可提高对远端同源物的敏感性。

Robert C Edgar
{"title":"通过 reseek 进行蛋白质结构比对可提高对远端同源物的敏感性。","authors":"Robert C Edgar","doi":"10.1093/bioinformatics/btae687","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.</p><p><strong>Results: </strong>Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a \"mega-alphabet\" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.</p><p><strong>Availability: </strong>https://github.com/rcedgar/reseek.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protein structure alignment by reseek improves sensitivity to remote homologs.\",\"authors\":\"Robert C Edgar\",\"doi\":\"10.1093/bioinformatics/btae687\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.</p><p><strong>Results: </strong>Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a \\\"mega-alphabet\\\" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.</p><p><strong>Availability: </strong>https://github.com/rcedgar/reseek.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae687\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动因:最近在根据氨基酸序列预测蛋白质折叠方面取得了突破性进展,从而产生了大量新结构,为生物信息学带来了新的机遇和挑战:Reseek是一种基于序列比对的新型蛋白质结构比对算法,蛋白质骨架中的每个残基都用一个字母来表示,这个 "巨型字母表 "包含85,899,345,920(∼1011)种不同的状态。与 DALI、TMalign 和 Foldseek 等最先进的方法相比,Reseek 大大提高了对远端同源物的灵敏度,其速度与之前最快的方法 Foldseek 不相上下。我们对扩展到大型人工智能预测折叠数据库的情况进行了分析。结果表明,Foldseek 的 E 值被低估了几个数量级,而 Reseek 的 E 值与测得的误差率十分吻合。可用性:https://github.com/rcedgar/reseek.Supplementary 信息:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Protein structure alignment by reseek improves sensitivity to remote homologs.

Motivation: Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.

Results: Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a "mega-alphabet" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.

Availability: https://github.com/rcedgar/reseek.

Supplementary information: Supplementary data are available at Bioinformatics online.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信