基于qpAdm的遗传混合筛选的错误发现率。

bioRxiv : the preprint server for biology Pub Date : 2025-02-03 DOI:10.1101/2023.04.25.538339

Olga Flegontova, Ulaş Işıldak, Eren Yüncü, Matthew P Williams, Christian D Huber, Jan Kočí, Leonid A Vyazov, Piya Changmai, Pavel Flegontov

{"title":"基于qpAdm的遗传混合筛选的错误发现率。","authors":"Olga Flegontova, Ulaş Işıldak, Eren Yüncü, Matthew P Williams, Christian D Huber, Jan Kočí, Leonid A Vyazov, Piya Changmai, Pavel Flegontov","doi":"10.1101/2023.04.25.538339","DOIUrl":null,"url":null,"abstract":"qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p -values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.Article summary: Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called qpAdm algorithm became disconnected from performance testing: the only extensive study of qpAdm on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by qpAdm on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of qpAdm protocols in archaeogenetic studies.","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614728/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of qpAdm -based screens for genetic admixture on admixture-graph-shaped histories and stepping-stone landscapes.\",\"authors\":\"Olga Flegontova, Ulaş Işıldak, Eren Yüncü, Matthew P Williams, Christian D Huber, Jan Kočí, Leonid A Vyazov, Piya Changmai, Pavel Flegontov\",\"doi\":\"10.1101/2023.04.25.538339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p -values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.Article summary: Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called qpAdm algorithm became disconnected from performance testing: the only extensive study of qpAdm on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by qpAdm on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of qpAdm protocols in archaeogenetic studies.\",\"PeriodicalId\":72407,\"journal\":{\"name\":\"bioRxiv : the preprint server for biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614728/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv : the preprint server for biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.04.25.538339\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.04.25.538339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

尽管从全基因组单核苷酸多态性数据中重建种群历史的方法有很多，但只有少数方法在古遗传学中流行起来：主成分分析（PCA）；ADMIXTURE，一种将个体建模为由实际或推断的种群表示的多个祖先来源的混合物的算法；外加剂的正式试验，如f3-统计和D/f4-统计；以及qpAdm，一种用于将双组分和更复杂的混合物模型拟合到群体或个人的工具。尽管它们在古遗传学中很受欢迎，这是由适度的计算要求和分析各种类型和质量的数据的能力来解释的，依赖qpAdm的协议筛选了许多不同复杂度的替代模型并找到了“拟合”模型（通常将估计的混合比例和p值作为模型拟合的复合标准），但在随机拓扑的混合图形式的复杂模拟种群历史上仍然未经测试。我们分析了从此类模拟中提取的基因型数据，并测试了各种类型的高通量qpAdm协议（“旋转”和“非旋转”，有或没有目标群体和代理祖先来源的时间分层，有或不有“模型竞争”步骤）。我们注意，高通量qpAdm方案可能不适合在研究不足的地区/时期进行探索性分析，因为它们的错误发现率在12%至68%之间变化，这取决于方案的细节以及模拟数据的数量和质量（即，>12%的拟合双向混合模型意味着未模拟的基因流）。我们证明，为了将qpAdm协议的错误发现率降低到接近0%，建议使用具有低丢失数据率的大型SNP集、具有严格执行的规则（目标组不预先确定其代理源的日期）的旋转qpAdm协议，以及无监督的ADMIXTURE分析作为验证可行qpAdm模型的一种方法。我们的研究有很多局限性：例如，这些建议取决于以下假设，即潜在的遗传史是一个复杂的混合图，而不是一个垫脚石模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance of qpAdm -based screens for genetic admixture on admixture-graph-shaped histories and stepping-stone landscapes.

qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p -values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.

Article summary: Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called qpAdm algorithm became disconnected from performance testing: the only extensive study of qpAdm on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by qpAdm on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of qpAdm protocols in archaeogenetic studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量