Xueying Chen, Hao Wang, Iris Broce, Anders Dale, Bing Yu, Laura Y Zhou, Xihao Li, Maria Argos, Martha L Daviglus, Jianwen Cai, Nora Franceschini, Tamar Sofer
{"title":"Old vs. new local ancestry inference in HCHS/SOL: a comparative study.","authors":"Xueying Chen, Hao Wang, Iris Broce, Anders Dale, Bing Yu, Laura Y Zhou, Xihao Li, Maria Argos, Martha L Daviglus, Jianwen Cai, Nora Franceschini, Tamar Sofer","doi":"10.1093/hmg/ddaf093","DOIUrl":null,"url":null,"abstract":"<p><p>Hispanic/Latino populations are admixed, with genetic contributions from multiple ancestral populations. To uncover genetic associations in these populations, researchers often turn to admixture mapping, which relies on inferred counts of \"local\" ancestry, i.e. the source ancestral population at a locus. Local ancestries are inferred using external reference panels that represent ancestral populations, making the choice of inference method and reference panel critical. This study used a dataset of Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) to evaluate how updates in local ancestry inference (LAI) affect results, specifically, the 'old' LAI performed using a popular inference method RFMix alongside 'new' inferences performed using Fast Local Ancestry Estimation (FLARE) with an updated reference panel. We compared their performance in terms of global and local ancestry correlations, as well as admixture mapping-based associations. Overall, the old and new inferences produced highly similar global and local ancestry estimates, with FLARE-based results closely matching those from RFMix in admixture mapping analyses. However, in some genomic regions, the old and new local ancestries showed relatively lower correlations (Pearson R < 0.9). Most of these regions (86.42%) were mapped to either ENCODE blacklist regions or gene clusters, compared to 7.67% of randomly-matched regions with high correlations (Pearson R > 0.97). These findings show that old and new inferences largely agree and suggest that regions of lower agreement are mostly due to genomic sequence contexts that lead to less stable inference, rather than due to the LAI software or genotyping technology used.</p>","PeriodicalId":13070,"journal":{"name":"Human molecular genetics","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human molecular genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/hmg/ddaf093","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Hispanic/Latino populations are admixed, with genetic contributions from multiple ancestral populations. To uncover genetic associations in these populations, researchers often turn to admixture mapping, which relies on inferred counts of "local" ancestry, i.e. the source ancestral population at a locus. Local ancestries are inferred using external reference panels that represent ancestral populations, making the choice of inference method and reference panel critical. This study used a dataset of Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) to evaluate how updates in local ancestry inference (LAI) affect results, specifically, the 'old' LAI performed using a popular inference method RFMix alongside 'new' inferences performed using Fast Local Ancestry Estimation (FLARE) with an updated reference panel. We compared their performance in terms of global and local ancestry correlations, as well as admixture mapping-based associations. Overall, the old and new inferences produced highly similar global and local ancestry estimates, with FLARE-based results closely matching those from RFMix in admixture mapping analyses. However, in some genomic regions, the old and new local ancestries showed relatively lower correlations (Pearson R < 0.9). Most of these regions (86.42%) were mapped to either ENCODE blacklist regions or gene clusters, compared to 7.67% of randomly-matched regions with high correlations (Pearson R > 0.97). These findings show that old and new inferences largely agree and suggest that regions of lower agreement are mostly due to genomic sequence contexts that lead to less stable inference, rather than due to the LAI software or genotyping technology used.
西班牙裔/拉丁裔人口是混合的,遗传贡献来自多个祖先群体。为了揭示这些群体的遗传关联,研究人员经常求助于混合作图,这依赖于推断的“本地”祖先数量,即在一个位点的源祖先群体。使用代表祖先种群的外部参考面板来推断本地祖先,这使得推断方法和参考面板的选择至关重要。本研究使用来自西班牙裔社区健康研究/拉丁裔研究(HCHS/SOL)的西班牙裔/拉丁裔个体数据集来评估本地祖先推断(LAI)的更新如何影响结果,特别是使用流行的推断方法RFMix执行的“旧”LAI以及使用更新的参考面板的快速本地祖先估计(FLARE)执行的“新”推断。我们比较了它们在全球和本地祖先相关性方面的表现,以及基于混合映射的关联。总的来说,新旧推断产生了高度相似的全球和局部祖先估计,基于flare的结果与来自RFMix的混合映射分析结果密切匹配。然而,在一些基因组区域,新旧地方祖先的相关性相对较低(Pearson R 0.97)。这些发现表明,新旧推断在很大程度上是一致的,并表明一致性较低的区域主要是由于导致推断不稳定的基因组序列背景,而不是由于所使用的LAI软件或基因分型技术。
期刊介绍:
Human Molecular Genetics concentrates on full-length research papers covering a wide range of topics in all aspects of human molecular genetics. These include:
the molecular basis of human genetic disease
developmental genetics
cancer genetics
neurogenetics
chromosome and genome structure and function
therapy of genetic disease
stem cells in human genetic disease and therapy, including the application of iPS cells
genome-wide association studies
mouse and other models of human diseases
functional genomics
computational genomics
In addition, the journal also publishes research on other model systems for the analysis of genes, especially when there is an obvious relevance to human genetics.