‘Highly-Informative’ Genetic Markers Can Bias Conclusions: Examples and General Solutions

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Andy Lee, William Hemstrom, Natalie Molea, Gordon Luikart, Mark R. Christie
{"title":"‘Highly-Informative’ Genetic Markers Can Bias Conclusions: Examples and General Solutions","authors":"Andy Lee,&nbsp;William Hemstrom,&nbsp;Natalie Molea,&nbsp;Gordon Luikart,&nbsp;Mark R. Christie","doi":"10.1111/1755-0998.70011","DOIUrl":null,"url":null,"abstract":"<p>High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-<i>F</i><sub><i>ST</i></sub> markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation<i>.</i> Biased results from choosing high-<i>F</i><sub><i>ST</i></sub> markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to <i>F</i><sub><i>ST</i></sub> approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high <i>F</i><sub><i>ST</i></sub> loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary <i>F</i><sub><i>ST</i></sub> cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.70011","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.70011","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

High-grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high-grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high-FST markers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation. Biased results from choosing high-FST markers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high-grading is not limited to FST approaches; high-grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting high FST loci for use in a GT-seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitrary FST cut-offs can reduce bias. Alternatively, permutation tests or cross-evaluation can be used to detect high-grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high-grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess).

Abstract Image

“高信息量”的遗传标记可能导致结论偏差:示例和一般解决方案。
高分级偏倚是指由于模型过拟合而导致的位点子集的高估能力。使用经验和模拟数据集,我们表明,当选择(即确定)高信息或高fst标记并用于后续评估时,高分级偏差可能导致对群体结构的严重高估,从而误导研究者,这是群体遗传研究中的一种常见做法。这个问题可能发生在没有适应当地环境的流感人群中。选择高fst标记的有偏差的结果可能会对管理和保护产生严重的下游影响,例如错误的保护单元划定,这可能会浪费有限的保护资源来保护错误定义的“种群”。此外,我们警告说,高分级并不局限于FST方法;每当首先选择一小部分标记来解释基于差异程度的群体之间的差异,然后再利用来估计这些群体之间的差异程度时,高分级偏差就会引起关注。例如,选择高FST位点用于GT-seq面板或使用差异表达基因来绘制多变量空间中的样本隶属度,都可能导致不存在的虚假结构。我们说明,使用基于统计的离群值检验来代替任意的FST截止值可以减少偏差。另外,排列试验或交叉评价可用于检测高分级偏倚。我们提供了一个R软件包PCAssess,通过自动化排列测试和主成分分析来帮助研究人员检测和防止遗传数据集中的高分级偏差(https://github.com/hemstrow/PCAssess)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信