Linkage-based ortholog refinement in bacterial pangenomes with CLARC.

IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Indra González Ojeda, Samantha G Palace, Pamela P Martinez, Taj Azarian, Lindsay R Grant, Laura L Hammitt, William P Hanage, Marc Lipsitch
{"title":"Linkage-based ortholog refinement in bacterial pangenomes with CLARC.","authors":"Indra González Ojeda, Samantha G Palace, Pamela P Martinez, Taj Azarian, Lindsay R Grant, Laura L Hammitt, William P Hanage, Marc Lipsitch","doi":"10.1093/nar/gkaf488","DOIUrl":null,"url":null,"abstract":"<p><p>Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts. CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC) improves pangenome analyses by condensing accessory COGs using functional annotation and linkage information. Through this approach, orthologous groups are consolidated into more practical units of selection. Analyzing 8000+ Streptococcus pneumoniae genomes, CLARC reduced accessory gene estimates by >30% and improved evolutionary predictions based on accessory gene frequencies. CLARC is effective across different bacterial species, making it a broadly applicable tool for comparative genomics. By refining COG definitions, CLARC offers critical insights into bacterial evolution, aiding genetic studies across diverse populations.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 12","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf488","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts. CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC) improves pangenome analyses by condensing accessory COGs using functional annotation and linkage information. Through this approach, orthologous groups are consolidated into more practical units of selection. Analyzing 8000+ Streptococcus pneumoniae genomes, CLARC reduced accessory gene estimates by >30% and improved evolutionary predictions based on accessory gene frequencies. CLARC is effective across different bacterial species, making it a broadly applicable tool for comparative genomics. By refining COG definitions, CLARC offers critical insights into bacterial evolution, aiding genetic studies across diverse populations.

利用CLARC对细菌泛基因组进行基于连锁的同源精化。
细菌基因组在基因含量和序列同一性方面表现出显著的差异。泛基因组分析通过将基因分类为核心和附属同源群(COGs)来探索这种多样性。然而,严格的序列识别切断可能会将不同的等位基因错误地分类为不同的基因,从而增加辅助基因的数量。CLARC (Connected Linkage and Alignment Redefinition of COGs) (https://github.com/IndraGonz/CLARC)通过使用功能注释和链接信息压缩附属COGs来改进泛基因组分析。通过这种方法,同源群被整合成更实际的选择单位。通过对8000多个肺炎链球菌基因组的分析,CLARC将辅助基因估计减少了约30%,并改进了基于辅助基因频率的进化预测。CLARC在不同的细菌种类中有效,使其成为广泛适用的比较基因组学工具。通过细化COG定义,CLARC为细菌进化提供了重要的见解,有助于不同种群的遗传研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信