{"title":"Mango Pangenome Reveals Dramatic Impacts of Reference Bias on Genomic Analyses","authors":"Bilal Ahmad, Ying Su, Yani Hao, Tayyaba Razzaq, Rida Arshad, Yi Zhang, Yingchun Zhang, Xingyi Wang, Guizhou Huang, Xiangnian Su, Ting Hou, Chaochao Li, Xuanwen Yang, Chuanning Li, Zhenzhou Chu, Qiuyan Wang, Yu Zhang, Zhongxin Jin, Qi Xu, Xiaodong Xu, Yanling Peng, Guiqi Bi, Chengjie Chen, Yeyuan Chen, Hua Xiao, Jianfeng Huang, Yongfeng Zhou, Xinmin Tian","doi":"10.1093/hr/uhaf166","DOIUrl":null,"url":null,"abstract":"Most genomics studies start by mapping sequencing data to a reference genome. The quality of reference genome assembly, genetic relatedness to the studied population, and the mapping method employed directly impact variant calling accuracy and subsequent genomic analyses, introducing reference bias and resulting in erroneous conclusions. However, the impacts of reference bias and methods to reduce it have gained limited attention. This study compared genomic analyses using four different reference genomes of mango (Mangifera indica), including the two haploid assemblies of haplotype-resolved telomere-to-telomere (T2T) genome assembly, a pangenome, and an older version of the reference genome available on NCBI. The choice of reference genome dramatically impacted the mapping efficiency and resulted in notable differences in calling the genetic variants, particularly structural variations (SVs). Phylogenetic analysis was more sensitive to the reference genome compared to genetic differentiation. Population genomic analyses of artificial selection in domestication and SV hotspot regions varied across reference genomes. Notably, the gene enrichment analyses showed significant differences in the top enriched biological processes depending on the reference genome used. Overall, the mango pangenome outperformed the other reference genomes across various metrics, followed by T2T reference genomes, as they captured greater diversity and effectively reduced reference bias. Our findings highlight the role of the mango pangenome in reducing reference bias and underscore the critical role of reference genome selection, suggesting that it is one of the most important factors in genomic studies.","PeriodicalId":13179,"journal":{"name":"Horticulture Research","volume":"77 1","pages":""},"PeriodicalIF":8.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Horticulture Research","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1093/hr/uhaf166","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Most genomics studies start by mapping sequencing data to a reference genome. The quality of reference genome assembly, genetic relatedness to the studied population, and the mapping method employed directly impact variant calling accuracy and subsequent genomic analyses, introducing reference bias and resulting in erroneous conclusions. However, the impacts of reference bias and methods to reduce it have gained limited attention. This study compared genomic analyses using four different reference genomes of mango (Mangifera indica), including the two haploid assemblies of haplotype-resolved telomere-to-telomere (T2T) genome assembly, a pangenome, and an older version of the reference genome available on NCBI. The choice of reference genome dramatically impacted the mapping efficiency and resulted in notable differences in calling the genetic variants, particularly structural variations (SVs). Phylogenetic analysis was more sensitive to the reference genome compared to genetic differentiation. Population genomic analyses of artificial selection in domestication and SV hotspot regions varied across reference genomes. Notably, the gene enrichment analyses showed significant differences in the top enriched biological processes depending on the reference genome used. Overall, the mango pangenome outperformed the other reference genomes across various metrics, followed by T2T reference genomes, as they captured greater diversity and effectively reduced reference bias. Our findings highlight the role of the mango pangenome in reducing reference bias and underscore the critical role of reference genome selection, suggesting that it is one of the most important factors in genomic studies.
期刊介绍:
Horticulture Research, an open access journal affiliated with Nanjing Agricultural University, has achieved the prestigious ranking of number one in the Horticulture category of the Journal Citation Reports ™ from Clarivate, 2022. As a leading publication in the field, the journal is dedicated to disseminating original research articles, comprehensive reviews, insightful perspectives, thought-provoking comments, and valuable correspondence articles and letters to the editor. Its scope encompasses all vital aspects of horticultural plants and disciplines, such as biotechnology, breeding, cellular and molecular biology, evolution, genetics, inter-species interactions, physiology, and the origination and domestication of crops.