Varigraph: An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes.

IF 24.1 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Molecular Plant Pub Date : 2025-09-01 Epub Date: 2025-08-05 DOI:10.1016/j.molp.2025.08.001
Ze-Zhen Du, Jia-Bao He, Pei-Xuan Xiao, Jianbing Hu, Ning Yang, Wen-Biao Jiao
{"title":"Varigraph: An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes.","authors":"Ze-Zhen Du, Jia-Bao He, Pei-Xuan Xiao, Jianbing Hu, Ning Yang, Wen-Biao Jiao","doi":"10.1016/j.molp.2025.08.001","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate variant genotyping is crucial for genomics-assisted breeding. Graph pangenome references can address single-reference bias, thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics. However, existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs, particularly in polyploid genomes. Here, we introduce Varigraph, an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants. We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes. Varigraph outperforms current state-of-the-art linear and graph-based genotypers across non-human genomes while maintaining comparable genotyping performance in human genomes. By employing efficient data structures including counting Bloom filter and bitmap storage, as well as GPU models, Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets. Its wide applicability extends to highly repetitive or large genomes, such as those of maize and wheat. Significantly, Varigraph can handle extensive pangenome graphs, as demonstrated by its performance on a dataset containing 252 rice genomes, for which it achieved a precision exceeding 0.9 for both small and large variants. Notably, Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids, enabling precise determination of allele dosage. In summary, this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studies and genomics-assisted breeding.</p>","PeriodicalId":19012,"journal":{"name":"Molecular Plant","volume":" ","pages":"1587-1601"},"PeriodicalIF":24.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Plant","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.molp.2025.08.001","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate variant genotyping is crucial for genomics-assisted breeding. Graph pangenome references can address single-reference bias, thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics. However, existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs, particularly in polyploid genomes. Here, we introduce Varigraph, an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants. We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes. Varigraph outperforms current state-of-the-art linear and graph-based genotypers across non-human genomes while maintaining comparable genotyping performance in human genomes. By employing efficient data structures including counting Bloom filter and bitmap storage, as well as GPU models, Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets. Its wide applicability extends to highly repetitive or large genomes, such as those of maize and wheat. Significantly, Varigraph can handle extensive pangenome graphs, as demonstrated by its performance on a dataset containing 252 rice genomes, for which it achieved a precision exceeding 0.9 for both small and large variants. Notably, Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids, enabling precise determination of allele dosage. In summary, this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studies and genomics-assisted breeding.

变异图:一个准确的和广泛适用的基于全基因组图谱的变异基因型,用于二倍体和多倍体基因组。
准确的变异基因分型对基因组辅助育种至关重要。图形泛基因组参考可以解决单参考偏差,从而提高变异基因分型的性能,并使下游应用于群体遗传学和数量遗传学。然而,现有的基于泛基因组的基因分型方法难以处理大型或复杂的泛基因组图,特别是在多倍体基因组中。在这里,我们介绍了Varigraph,这是一种利用变异位点和短读段之间独特和重复k-mers的比较进行小变异和大变异基因分型的算法。我们对Varigraph在不同的具有代表性的植物基因组和人类基因组上进行了评估。Varigraph在非人类基因组中优于当前最先进的线性和基于图形的基因分型,同时在人类基因组中保持相当的基因分型性能。通过采用高效的数据结构,包括计数布隆过滤器和位图存储,以及GPU模型,Varigraph在重复区域实现了更高的精度和鲁棒性,同时管理了大型数据集的计算成本。它的广泛适用性延伸到高度重复或大的基因组,如玉米和小麦的基因组。值得注意的是,Varigraph可以处理广泛的泛基因组图,正如它在包含252个水稻基因组的数据集上的表现所证明的那样,它在大小变异上的精度都超过了0.9。值得注意的是,Varigraph能够有效地利用泛基因组图对自多倍体进行基因分型,从而精确确定等位基因的剂量。这项工作为植物基因组的基因分型提供了可靠和准确的解决方案,并将推动植物基因组研究和基因组辅助育种。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Plant
Molecular Plant 植物科学-生化与分子生物学
CiteScore
37.60
自引率
2.20%
发文量
1784
审稿时长
1 months
期刊介绍: Molecular Plant is dedicated to serving the plant science community by publishing novel and exciting findings with high significance in plant biology. The journal focuses broadly on cellular biology, physiology, biochemistry, molecular biology, genetics, development, plant-microbe interaction, genomics, bioinformatics, and molecular evolution. Molecular Plant publishes original research articles, reviews, Correspondence, and Spotlights on the most important developments in plant biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信