Ze-Zhen Du, Jia-Bao He, Pei-Xuan Xiao, Jianbing Hu, Ning Yang, Wen-Biao Jiao
{"title":"变异图:一个准确的和广泛适用的基于全基因组图谱的变异基因型,用于二倍体和多倍体基因组。","authors":"Ze-Zhen Du, Jia-Bao He, Pei-Xuan Xiao, Jianbing Hu, Ning Yang, Wen-Biao Jiao","doi":"10.1016/j.molp.2025.08.001","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate variant genotyping is crucial for genomics-assisted breeding. Graph pangenome references can address single-reference bias, thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics. However, existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs, particularly in polyploid genomes. Here, we introduce Varigraph, an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants. We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes. Varigraph outperforms current state-of-the-art linear and graph-based genotypers across non-human genomes while maintaining comparable genotyping performance in human genomes. By employing efficient data structures including counting Bloom filter and bitmap storage, as well as GPU models, Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets. Its wide applicability extends to highly repetitive or large genomes, such as those of maize and wheat. Significantly, Varigraph can handle extensive pangenome graphs, as demonstrated by its performance on a dataset containing 252 rice genomes, for which it achieved a precision exceeding 0.9 for both small and large variants. Notably, Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids, enabling precise determination of allele dosage. In summary, this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studies and genomics-assisted breeding.</p>","PeriodicalId":19012,"journal":{"name":"Molecular Plant","volume":" ","pages":"1587-1601"},"PeriodicalIF":24.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Varigraph: An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes.\",\"authors\":\"Ze-Zhen Du, Jia-Bao He, Pei-Xuan Xiao, Jianbing Hu, Ning Yang, Wen-Biao Jiao\",\"doi\":\"10.1016/j.molp.2025.08.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Accurate variant genotyping is crucial for genomics-assisted breeding. Graph pangenome references can address single-reference bias, thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics. However, existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs, particularly in polyploid genomes. Here, we introduce Varigraph, an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants. We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes. Varigraph outperforms current state-of-the-art linear and graph-based genotypers across non-human genomes while maintaining comparable genotyping performance in human genomes. By employing efficient data structures including counting Bloom filter and bitmap storage, as well as GPU models, Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets. Its wide applicability extends to highly repetitive or large genomes, such as those of maize and wheat. Significantly, Varigraph can handle extensive pangenome graphs, as demonstrated by its performance on a dataset containing 252 rice genomes, for which it achieved a precision exceeding 0.9 for both small and large variants. Notably, Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids, enabling precise determination of allele dosage. In summary, this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studies and genomics-assisted breeding.</p>\",\"PeriodicalId\":19012,\"journal\":{\"name\":\"Molecular Plant\",\"volume\":\" \",\"pages\":\"1587-1601\"},\"PeriodicalIF\":24.1000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Plant\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.molp.2025.08.001\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/5 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Plant","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.molp.2025.08.001","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Varigraph: An accurate and widely applicable pangenome graph-based variant genotyper for diploid and polyploid genomes.
Accurate variant genotyping is crucial for genomics-assisted breeding. Graph pangenome references can address single-reference bias, thereby enhancing the performance of variant genotyping and empowering downstream applications in population genetics and quantitative genetics. However, existing pangenome-based genotyping methods are ineffective in handling large or complex pangenome graphs, particularly in polyploid genomes. Here, we introduce Varigraph, an algorithm that leverages the comparison of unique and repetitive k-mers between variant sites and short reads for genotyping both small and large variants. We evaluated Varigraph on a diverse set of representative plant genomes as well as human genomes. Varigraph outperforms current state-of-the-art linear and graph-based genotypers across non-human genomes while maintaining comparable genotyping performance in human genomes. By employing efficient data structures including counting Bloom filter and bitmap storage, as well as GPU models, Varigraph achieves improved precision and robustness in repetitive regions while managing computational costs for large datasets. Its wide applicability extends to highly repetitive or large genomes, such as those of maize and wheat. Significantly, Varigraph can handle extensive pangenome graphs, as demonstrated by its performance on a dataset containing 252 rice genomes, for which it achieved a precision exceeding 0.9 for both small and large variants. Notably, Varigraph is capable of effectively utilizing pangenome graphs for genotyping autopolyploids, enabling precise determination of allele dosage. In summary, this work provides a robust and accurate solution for genotyping plant genomes and will advance plant genomic studies and genomics-assisted breeding.
期刊介绍:
Molecular Plant is dedicated to serving the plant science community by publishing novel and exciting findings with high significance in plant biology. The journal focuses broadly on cellular biology, physiology, biochemistry, molecular biology, genetics, development, plant-microbe interaction, genomics, bioinformatics, and molecular evolution.
Molecular Plant publishes original research articles, reviews, Correspondence, and Spotlights on the most important developments in plant biology.