Adrien Oliva, Rachel Foare, Peter Campbell, Natalie A Twine, Denis C Bauer, Angad Singh Johar
{"title":"在代表性不足的中东和非洲之角人群中改进群体遗传学分析和参考偏差的全基因组学方法。","authors":"Adrien Oliva, Rachel Foare, Peter Campbell, Natalie A Twine, Denis C Bauer, Angad Singh Johar","doi":"10.3390/biom15040582","DOIUrl":null,"url":null,"abstract":"<p><p>Genomics plays a crucial role in addressing health disparities, yet most studies rely on the hg38 linear reference genome, limiting the potential of pangenomic approaches, particularly for underrepresented populations. In this study, we focus on characterising East African populations, particularly Somalis, by constructing a variation graph using Mozabites from the Human Genome Diversity Project (HGDP) given their ancestral affinity with Somalis. We evaluated the effectiveness of this graph-based reference in estimating effective population sizes (<i>Ne</i>) in Bedouins compared to the hg38 reference and examined its impact on allele frequencies and genome-wide association studies (GWAS). Applying a coalescent model to the graph-based reference produced a <i>Ne</i> estimate of approximately 17 for the Bedouin population, which was significantly lower than the estimate from the hg38 reference (approximately 79,000). Only the graph-based estimate fell within the 95% confidence interval in simulations, indicating improved accuracy. Moreover, graph variants exhibited significantly lower allele frequencies (<i>p</i>-value < 2.2 × 10<sup>-16</sup>), suggesting potential effects on the interpretation and power of GWAS. Notably, GWAS variants specific to Bedouins derived from the graph showed lower frequencies (<i>p</i> = 0.023) than those obtained from the linear reference. These findings suggest that a pangenomic approach, informed by populations with ancestral affinities such as the Mozabites, provides more accurate estimates of <i>Ne</i> and allele frequencies. This highlights the importance of pangenomic strategies to better capture genetic diversity in underrepresented populations, a critical step towards improving population genetics studies, personalised medicine, and equitable healthcare.</p>","PeriodicalId":8943,"journal":{"name":"Biomolecules","volume":"15 4","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12025191/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Pangenomic Approach to Improve Population Genetics Analysis and Reference Bias in Underrepresented Middle Eastern and Horn of Africa Populations.\",\"authors\":\"Adrien Oliva, Rachel Foare, Peter Campbell, Natalie A Twine, Denis C Bauer, Angad Singh Johar\",\"doi\":\"10.3390/biom15040582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Genomics plays a crucial role in addressing health disparities, yet most studies rely on the hg38 linear reference genome, limiting the potential of pangenomic approaches, particularly for underrepresented populations. In this study, we focus on characterising East African populations, particularly Somalis, by constructing a variation graph using Mozabites from the Human Genome Diversity Project (HGDP) given their ancestral affinity with Somalis. We evaluated the effectiveness of this graph-based reference in estimating effective population sizes (<i>Ne</i>) in Bedouins compared to the hg38 reference and examined its impact on allele frequencies and genome-wide association studies (GWAS). Applying a coalescent model to the graph-based reference produced a <i>Ne</i> estimate of approximately 17 for the Bedouin population, which was significantly lower than the estimate from the hg38 reference (approximately 79,000). Only the graph-based estimate fell within the 95% confidence interval in simulations, indicating improved accuracy. Moreover, graph variants exhibited significantly lower allele frequencies (<i>p</i>-value < 2.2 × 10<sup>-16</sup>), suggesting potential effects on the interpretation and power of GWAS. Notably, GWAS variants specific to Bedouins derived from the graph showed lower frequencies (<i>p</i> = 0.023) than those obtained from the linear reference. These findings suggest that a pangenomic approach, informed by populations with ancestral affinities such as the Mozabites, provides more accurate estimates of <i>Ne</i> and allele frequencies. This highlights the importance of pangenomic strategies to better capture genetic diversity in underrepresented populations, a critical step towards improving population genetics studies, personalised medicine, and equitable healthcare.</p>\",\"PeriodicalId\":8943,\"journal\":{\"name\":\"Biomolecules\",\"volume\":\"15 4\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12025191/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomolecules\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biom15040582\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomolecules","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biom15040582","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
A Pangenomic Approach to Improve Population Genetics Analysis and Reference Bias in Underrepresented Middle Eastern and Horn of Africa Populations.
Genomics plays a crucial role in addressing health disparities, yet most studies rely on the hg38 linear reference genome, limiting the potential of pangenomic approaches, particularly for underrepresented populations. In this study, we focus on characterising East African populations, particularly Somalis, by constructing a variation graph using Mozabites from the Human Genome Diversity Project (HGDP) given their ancestral affinity with Somalis. We evaluated the effectiveness of this graph-based reference in estimating effective population sizes (Ne) in Bedouins compared to the hg38 reference and examined its impact on allele frequencies and genome-wide association studies (GWAS). Applying a coalescent model to the graph-based reference produced a Ne estimate of approximately 17 for the Bedouin population, which was significantly lower than the estimate from the hg38 reference (approximately 79,000). Only the graph-based estimate fell within the 95% confidence interval in simulations, indicating improved accuracy. Moreover, graph variants exhibited significantly lower allele frequencies (p-value < 2.2 × 10-16), suggesting potential effects on the interpretation and power of GWAS. Notably, GWAS variants specific to Bedouins derived from the graph showed lower frequencies (p = 0.023) than those obtained from the linear reference. These findings suggest that a pangenomic approach, informed by populations with ancestral affinities such as the Mozabites, provides more accurate estimates of Ne and allele frequencies. This highlights the importance of pangenomic strategies to better capture genetic diversity in underrepresented populations, a critical step towards improving population genetics studies, personalised medicine, and equitable healthcare.
BiomoleculesBiochemistry, Genetics and Molecular Biology-Molecular Biology
CiteScore
9.40
自引率
3.60%
发文量
1640
审稿时长
18.28 days
期刊介绍:
Biomolecules (ISSN 2218-273X) is an international, peer-reviewed open access journal focusing on biogenic substances and their biological functions, structures, interactions with other molecules, and their microenvironment as well as biological systems. Biomolecules publishes reviews, regular research papers and short communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.