Clade Distillation for Genome-wide Association Studies.

IF 5.1 3区生物学 Q2 GENETICS & HEREDITY

Genetics Pub Date : 2025-08-07 DOI:10.1093/genetics/iyaf158

Ryan Christ, Xinxin Wang, Louis J M Aslett, David Steinsaltz, Ira Hall

{"title":"Clade Distillation for Genome-wide Association Studies.","authors":"Ryan Christ, Xinxin Wang, Louis J M Aslett, David Steinsaltz, Ira Hall","doi":"10.1093/genetics/iyaf158","DOIUrl":null,"url":null,"abstract":"<p><p>Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity - when multiple causal variants modulate a phenotype - in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyaf158","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity - when multiple causal variants modulate a phenotype - in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.

查看原文本刊更多论文

进化精馏用于全基因组关联研究。

检测推断的单倍型家谱与表型的关联一直是人类遗传学的长期目标，因为它们有潜力检测由等位基因异质性驱动的关联信号——当多个因果变异调节表型时——在编码区和非编码区。最近用于推断基因座特异性谱系树的可扩展方法，或其表示，已经朝着这一目标取得了实质性进展；然而，由于随着样本量的增加，进化枝数量的增加，测试这些树与表型关联的问题仍未解决。为了解决这个问题，我们对kalis祖先推理引擎进行了一些实际的改进，包括用于解码隐马尔可夫模型的通用最优检查点算法，从而实现了高效的全基因组分析。然后，我们提出了LOCATER，一个基于最近提出的稳定蒸馏框架的强大的新过程，用于测试特征关联的局部树表示。尽管LOCATER在这里是与kalis一起演示的，但它可以用于测试来自任何祖先推理引擎的输出，而不管这些引擎是否返回离散树结构、相关性矩阵，或者在每个位点返回两者的某种组合。通过模拟定量表型，我们的研究结果表明，在等位基因异质性的情况下，LOCATER比传统的单标记测试、ARG-Needle和基于窗口的测试取得了显著的优势，同时也改善了因果区域定位。这些发现表明，基于家谱的关联检测将是一种卓有成效的基因发现方法，特别是对于由多个超罕见变异驱动的信号。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics GENETICS & HEREDITY-

CiteScore

6.90

自引率

6.10%

发文量

177

审稿时长

1.5 months

期刊介绍： GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.