NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes.

IF 1.8 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali
{"title":"NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes.","authors":"Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali","doi":"10.1007/s00239-025-10268-2","DOIUrl":null,"url":null,"abstract":"<p><p>Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed \"NCBI Orthologs\", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00239-025-10268-2","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed "NCBI Orthologs", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.

NCBI同源物:跨真核生物基因组计算高精度同源物的公共资源和可扩展方法。
同源物是实现比较基因组学分析的基础,进一步加深了我们对真核生物的理解。高质量真核生物基因组的可用性前所未有的增加需要可扩展和准确的方法来进行同源推断。国家生物技术信息中心(NCBI)开发了“NCBI Orthologs”,这是一种资源和计算管道,旨在应对NCBI RefSeq框架内的这一挑战。该系统集成了蛋白质相似性,核苷酸比对和微合成,以实现跨多种真核生物的高精度同源分配。该管道利用高质量的RefSeq注释和单独处理基因组,确保可扩展性。由此产生的同源数据,组织成基因水平锚定集,使功能注释信息的传播和促进比较基因组学。至关重要的是,这些数据被整合到NCBI基因资源中,为用户提供从不同入口点访问的机会。NCBI数据集资源提供了一个直观的界面来探索web上的同源关系,并允许通过web、命令行工具和API批量下载数据。我们详细介绍了方法,包括锚种选择和决策树,用于获得高置信度的一对一正交关系。NCBI Orthologs是促进功能注释工作和增强我们对真核生物基因进化的理解的宝贵资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Molecular Evolution
Journal of Molecular Evolution 生物-进化生物学
CiteScore
5.50
自引率
2.60%
发文量
36
审稿时长
3 months
期刊介绍: Journal of Molecular Evolution covers experimental, computational, and theoretical work aimed at deciphering features of molecular evolution and the processes bearing on these features, from the initial formation of macromolecular systems through their evolution at the molecular level, the co-evolution of their functions in cellular and organismal systems, and their influence on organismal adaptation, speciation, and ecology. Topics addressed include the evolution of informational macromolecules and their relation to more complex levels of biological organization, including populations and taxa, as well as the molecular basis for the evolution of ecological interactions of species and the use of molecular data to infer fundamental processes in evolutionary ecology. This coverage accommodates such subfields as new genome sequences, comparative structural and functional genomics, population genetics, the molecular evolution of development, the evolution of gene regulation and gene interaction networks, and in vitro evolution of DNA and RNA, molecular evolutionary ecology, and the development of methods and theory that enable molecular evolutionary inference, including but not limited to, phylogenetic methods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信