Indra González Ojeda, Samantha G Palace, Pamela P Martinez, Taj Azarian, Lindsay R Grant, Laura L Hammitt, William P Hanage, Marc Lipsitch
{"title":"Linkage-based ortholog refinement in bacterial pangenomes with CLARC.","authors":"Indra González Ojeda, Samantha G Palace, Pamela P Martinez, Taj Azarian, Lindsay R Grant, Laura L Hammitt, William P Hanage, Marc Lipsitch","doi":"10.1093/nar/gkaf488","DOIUrl":null,"url":null,"abstract":"<p><p>Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts. CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC) improves pangenome analyses by condensing accessory COGs using functional annotation and linkage information. Through this approach, orthologous groups are consolidated into more practical units of selection. Analyzing 8000+ Streptococcus pneumoniae genomes, CLARC reduced accessory gene estimates by >30% and improved evolutionary predictions based on accessory gene frequencies. CLARC is effective across different bacterial species, making it a broadly applicable tool for comparative genomics. By refining COG definitions, CLARC offers critical insights into bacterial evolution, aiding genetic studies across diverse populations.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 12","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf488","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts. CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC) improves pangenome analyses by condensing accessory COGs using functional annotation and linkage information. Through this approach, orthologous groups are consolidated into more practical units of selection. Analyzing 8000+ Streptococcus pneumoniae genomes, CLARC reduced accessory gene estimates by >30% and improved evolutionary predictions based on accessory gene frequencies. CLARC is effective across different bacterial species, making it a broadly applicable tool for comparative genomics. By refining COG definitions, CLARC offers critical insights into bacterial evolution, aiding genetic studies across diverse populations.
细菌基因组在基因含量和序列同一性方面表现出显著的差异。泛基因组分析通过将基因分类为核心和附属同源群(COGs)来探索这种多样性。然而,严格的序列识别切断可能会将不同的等位基因错误地分类为不同的基因,从而增加辅助基因的数量。CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC)通过使用功能注释和链接信息压缩附属COGs来改进泛基因组分析。通过这种方法,同源群被整合成更实际的选择单位。通过对8000多个肺炎链球菌基因组的分析,CLARC将辅助基因估计减少了约30%,并改进了基于辅助基因频率的进化预测。CLARC在不同的细菌种类中有效,使其成为广泛适用的比较基因组学工具。通过细化COG定义,CLARC为细菌进化提供了重要的见解,有助于不同种群的遗传研究。
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.