Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali
{"title":"NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes.","authors":"Dong-Ha Oh, Alexander Astashyn, Barbara Robbertse, Nuala A O'leary, W Ray Anderson, Laurie Breen, Eric Cox, Olga Ermolaeva, Robert Falk, Vichet Hem, J Bradley Holmes, Patrick Masterson, Kelly M McGarvey, Eyal Mozes, John P Torcivia, Mirian T N Tsuchiya, Craig Wallin, Françoise Thibaud-Nissen, Terence D Murphy, Vamsi K Kodali","doi":"10.1007/s00239-025-10268-2","DOIUrl":null,"url":null,"abstract":"<p><p>Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed \"NCBI Orthologs\", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00239-025-10268-2","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed "NCBI Orthologs", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.
期刊介绍:
Journal of Molecular Evolution covers experimental, computational, and theoretical work aimed at deciphering features of molecular evolution and the processes bearing on these features, from the initial formation of macromolecular systems through their evolution at the molecular level, the co-evolution of their functions in cellular and organismal systems, and their influence on organismal adaptation, speciation, and ecology. Topics addressed include the evolution of informational macromolecules and their relation to more complex levels of biological organization, including populations and taxa, as well as the molecular basis for the evolution of ecological interactions of species and the use of molecular data to infer fundamental processes in evolutionary ecology. This coverage accommodates such subfields as new genome sequences, comparative structural and functional genomics, population genetics, the molecular evolution of development, the evolution of gene regulation and gene interaction networks, and in vitro evolution of DNA and RNA, molecular evolutionary ecology, and the development of methods and theory that enable molecular evolutionary inference, including but not limited to, phylogenetic methods.