Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories.

IF 3.3 3区生物学 Q2 GENETICS & HEREDITY

Genetics Pub Date : 2025-04-17 DOI:10.1093/genetics/iyaf033

Dandan Peng, Obadiah J Mulder, Michael D Edge

{"title":"Evaluating ARG-estimation methods in the context of estimating population-mean polygenic score histories.","authors":"Dandan Peng, Obadiah J Mulder, Michael D Edge","doi":"10.1093/genetics/iyaf033","DOIUrl":null,"url":null,"abstract":"<p><p>Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ancestral recombination graph (ARG) may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ARG. Here, we examine the performance in simulation of seven ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle, ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error, confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust used samples 10 or more times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12005257/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/genetics/iyaf033","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ancestral recombination graph (ARG) may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ARG. Here, we examine the performance in simulation of seven ARG estimation methods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle, ASMC-clust, and SINGER, using their estimated coalescent trees and examining bias, mean squared error, confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods, SINGER produced the most accurate estimated PGS histories in many instances, even when Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust used samples 10 or more times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed by Relate, tsinfer+tsdate, ARG-Needle, and ASMC-clust are of greatest importance when the recent past is of interest-further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.

查看原文本刊更多论文

在估计人口平均多基因评分史的背景下评估arg估计方法。

估计基因组边缘结瘤树的可扩展方法为研究进化提供了新的机会，并产生了相当大的兴奋，新方法将可扩展性扩展到数千个样本。可用方法的基准测试揭示了准确性和可扩展性之间的一般权衡，但下游应用程序的性能并不总是容易从一般性能测量中预测出来，这表明ARG的特定功能可能对估计ARG的特定下游应用程序很重要。为了说明这一点，我们将ARG估计方法与一组特定的方法进行了比较，这些方法使用由祖先重组图（ARG）编码的边缘聚结树来估计种群平均多基因得分（PGS）的历史时间过程。本文研究了ARGweaver、RENT+、Relate、tsinfer+tsdate、ARG- needle、asmc - cluster和SINGER等7种ARG估计方法的模拟性能，使用它们估计的聚结树，并检查了下游方法的偏差、均方误差（MSE）、置信区间覆盖率和I型和II型错误率。尽管它不能扩展到其他新方法所能达到的样本量，但SINGER在许多情况下产生了最准确的PGS历史估计，即使当Relate、tsinfer+tsdate、ARG-Needle和ASMC-clust使用的样本是SINGER的10倍或更多时也是如此。一般来说，方法的最佳选择取决于可用样本的数量和感兴趣的历史时间段。特别是，当对最近的过去感兴趣时，Relate、tsinfer+tsdate、ARG-Needle和ASMC-clust所允许的前所未有的样本量是最重要的——更早的时候，树的大部分已经合并，当代样本量的差异不那么显著。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetics GENETICS & HEREDITY-

CiteScore

6.90

自引率

6.10%

发文量

177

审稿时长

1.5 months

期刊介绍： GENETICS is published by the Genetics Society of America, a scholarly society that seeks to deepen our understanding of the living world by advancing our understanding of genetics. Since 1916, GENETICS has published high-quality, original research presenting novel findings bearing on genetics and genomics. The journal publishes empirical studies of organisms ranging from microbes to humans, as well as theoretical work. While it has an illustrious history, GENETICS has changed along with the communities it serves: it is not your mentor''s journal. The editors make decisions quickly – in around 30 days – without sacrificing the excellence and scholarship for which the journal has long been known. GENETICS is a peer reviewed, peer-edited journal, with an international reach and increasing visibility and impact. All editorial decisions are made through collaboration of at least two editors who are practicing scientists. GENETICS is constantly innovating: expanded types of content include Reviews, Commentary (current issues of interest to geneticists), Perspectives (historical), Primers (to introduce primary literature into the classroom), Toolbox Reviews, plus YeastBook, FlyBook, and WormBook (coming spring 2016). For particularly time-sensitive results, we publish Communications. As part of our mission to serve our communities, we''ve published thematic collections, including Genomic Selection, Multiparental Populations, Mouse Collaborative Cross, and the Genetics of Sex.