{"title":"The Patterson-Price-Reich's rule of population structure analysis from genetic marker data","authors":"Jinliang Wang","doi":"10.1016/j.tpb.2025.03.001","DOIUrl":null,"url":null,"abstract":"<div><div>Delineating population structure from the marker genotypes of a sample of individuals is now routinely conducted in the fields of molecular ecology, evolution and conservation biology. Various Bayesian and likelihood methods as well as more general statistical methods (e.g. PCA) have been proposed to detect population structure, to assign sampled individuals to discrete clusters (subpopulations), and to estimate the admixture proportions of each sampled individual. Regardless of the methods, the power of a structure analysis depends on the strength of population structure (measured by <em>F<sub>ST</sub></em>) relative to the amount of marker information (measured by <em>NL</em>, where <em>N</em> and <em>L</em> are the numbers of sampled individuals and loci respectively). Patterson, Price and Reich (2006) proposed that population structure is unidentifiable when data size <em>D</em> = <em>NL</em> is smaller than <span><math><mrow><mn>1</mn><mo>/</mo><msubsup><mi>F</mi><mrow><mi>S</mi><mi>T</mi></mrow><mn>2</mn></msubsup></mrow></math></span> and quickly becomes identifiable easily with an increasing <em>D</em> or <em>F<sub>ST</sub></em> when <span><math><mrow><mi>D</mi><mo>></mo><mn>1</mn><mo>/</mo><msubsup><mi>F</mi><mrow><mi>S</mi><mi>T</mi></mrow><mn>2</mn></msubsup></mrow></math></span>. In this study, I investigated this phase change PPR rule by analysing both simulated genomic data and empirical data by four likelihood admixture analysis methods. The results show that the PPR rule is largely valid, but the accuracy of a structure analysis is also affected by the number of subpopulations <em>K</em>. A more complicated population structure with a larger <em>K</em> requires a larger <span><math><mrow><mi>N</mi><mi>L</mi><msubsup><mi>F</mi><mrow><mi>S</mi><mi>T</mi></mrow><mn>2</mn></msubsup></mrow></math></span> to resolve accurately. For a given <span><math><mrow><mi>N</mi><mi>L</mi><msubsup><mi>F</mi><mrow><mi>S</mi><mi>T</mi></mrow><mn>2</mn></msubsup></mrow></math></span> above the PPR threshold value of 1, increasing <em>L</em> and decreasing <em>N</em> is advantageous over increasing <em>N</em> and decreasing <em>L</em> in improving admixture estimation accuracy.</div></div>","PeriodicalId":49437,"journal":{"name":"Theoretical Population Biology","volume":"163 ","pages":"Pages 13-23"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Population Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0040580925000188","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Delineating population structure from the marker genotypes of a sample of individuals is now routinely conducted in the fields of molecular ecology, evolution and conservation biology. Various Bayesian and likelihood methods as well as more general statistical methods (e.g. PCA) have been proposed to detect population structure, to assign sampled individuals to discrete clusters (subpopulations), and to estimate the admixture proportions of each sampled individual. Regardless of the methods, the power of a structure analysis depends on the strength of population structure (measured by FST) relative to the amount of marker information (measured by NL, where N and L are the numbers of sampled individuals and loci respectively). Patterson, Price and Reich (2006) proposed that population structure is unidentifiable when data size D = NL is smaller than and quickly becomes identifiable easily with an increasing D or FST when . In this study, I investigated this phase change PPR rule by analysing both simulated genomic data and empirical data by four likelihood admixture analysis methods. The results show that the PPR rule is largely valid, but the accuracy of a structure analysis is also affected by the number of subpopulations K. A more complicated population structure with a larger K requires a larger to resolve accurately. For a given above the PPR threshold value of 1, increasing L and decreasing N is advantageous over increasing N and decreasing L in improving admixture estimation accuracy.
期刊介绍:
An interdisciplinary journal, Theoretical Population Biology presents articles on theoretical aspects of the biology of populations, particularly in the areas of demography, ecology, epidemiology, evolution, and genetics. Emphasis is on the development of mathematical theory and models that enhance the understanding of biological phenomena.
Articles highlight the motivation and significance of the work for advancing progress in biology, relying on a substantial mathematical effort to obtain biological insight. The journal also presents empirical results and computational and statistical methods directly impinging on theoretical problems in population biology.