Olga Flegontova, Ulaş Işıldak, Eren Yüncü, Matthew P Williams, Christian D Huber, Jan Kočí, Leonid A Vyazov, Piya Changmai, Pavel Flegontov
{"title":"基于qpAdm的遗传混合筛选的错误发现率。","authors":"Olga Flegontova, Ulaş Işıldak, Eren Yüncü, Matthew P Williams, Christian D Huber, Jan Kočí, Leonid A Vyazov, Piya Changmai, Pavel Flegontov","doi":"10.1101/2023.04.25.538339","DOIUrl":null,"url":null,"abstract":"<p><p><i>qpAdm</i> is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, <i>qpAdm</i> remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput <i>qpAdm</i> protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput <i>qpAdm</i> protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between <i>qpAdm p</i> -values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of <i>qpAdm</i> models from two studies in terms of source-target distance and symmetry and suggest improvements to <i>qpAdm</i> protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.</p><p><strong>Article summary: </strong>Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called <i>qpAdm</i> algorithm became disconnected from performance testing: the only extensive study of <i>qpAdm</i> on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by <i>qpAdm</i> on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of <i>qpAdm</i> protocols in archaeogenetic studies.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614728/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of <i>qpAdm</i> -based screens for genetic admixture on admixture-graph-shaped histories and stepping-stone landscapes.\",\"authors\":\"Olga Flegontova, Ulaş Işıldak, Eren Yüncü, Matthew P Williams, Christian D Huber, Jan Kočí, Leonid A Vyazov, Piya Changmai, Pavel Flegontov\",\"doi\":\"10.1101/2023.04.25.538339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><i>qpAdm</i> is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, <i>qpAdm</i> remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput <i>qpAdm</i> protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput <i>qpAdm</i> protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between <i>qpAdm p</i> -values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of <i>qpAdm</i> models from two studies in terms of source-target distance and symmetry and suggest improvements to <i>qpAdm</i> protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.</p><p><strong>Article summary: </strong>Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called <i>qpAdm</i> algorithm became disconnected from performance testing: the only extensive study of <i>qpAdm</i> on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by <i>qpAdm</i> on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of <i>qpAdm</i> protocols in archaeogenetic studies.</p>\",\"PeriodicalId\":72407,\"journal\":{\"name\":\"bioRxiv : the preprint server for biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614728/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv : the preprint server for biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.04.25.538339\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.04.25.538339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance of qpAdm -based screens for genetic admixture on admixture-graph-shaped histories and stepping-stone landscapes.
qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p -values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.
Article summary: Proliferation in the archaeogenetic literature of protocols for detection of admixed groups based a so-called qpAdm algorithm became disconnected from performance testing: the only extensive study of qpAdm on simulated data showed that it performs well under an unrealistically simple demographic scenario. We found that false discoveries of gene flows by qpAdm on a collection of random admixture-graph-shaped histories and on complex stepping-stone landscapes are very common and provide guidelines for design of qpAdm protocols in archaeogenetic studies.