Andy Lee, William Hemstrom, Natalie Molea, Gordon Luikart, Mark R. Christie
{"title":"“高信息量”的遗传标记可能导致结论偏差:示例和一般解决方案。","authors":"Andy Lee, William Hemstrom, Natalie Molea, Gordon Luikart, Mark R. Christie","doi":"10.1111/1755-0998.70011","DOIUrl":null,"url":null,"abstract":"<p>High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-<i>F</i><sub><i>ST</i></sub> markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation<i>.</i> Biased results from choosing high-<i>F</i><sub><i>ST</i></sub> markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to <i>F</i><sub><i>ST</i></sub> approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high <i>F</i><sub><i>ST</i></sub> loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary <i>F</i><sub><i>ST</i></sub> cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70011","citationCount":"0","resultStr":"{\"title\":\"‘Highly-Informative’ Genetic Markers Can Bias Conclusions: Examples and General Solutions\",\"authors\":\"Andy Lee, William Hemstrom, Natalie Molea, Gordon Luikart, Mark R. Christie\",\"doi\":\"10.1111/1755-0998.70011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-<i>F</i><sub><i>ST</i></sub> markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation<i>.</i> Biased results from choosing high-<i>F</i><sub><i>ST</i></sub> markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to <i>F</i><sub><i>ST</i></sub> approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high <i>F</i><sub><i>ST</i></sub> loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary <i>F</i><sub><i>ST</i></sub> cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).</p>\",\"PeriodicalId\":211,\"journal\":{\"name\":\"Molecular Ecology Resources\",\"volume\":\"25 7\",\"pages\":\"\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70011\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Ecology Resources\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.70011\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.70011","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
‘Highly-Informative’ Genetic Markers Can Bias Conclusions: Examples and General Solutions
High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-FST markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation. Biased results from choosing high-FST markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to FST approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high FST loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary FST cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.