Study design and the sampling of deleterious rare variants in biobank-scale datasets.

Margaret C Steiner, Daniel P Rice, Arjun Biddanda, Mariadaria K Ianni-Ravn, Christian Porras, John Novembre
{"title":"Study design and the sampling of deleterious rare variants in biobank-scale datasets.","authors":"Margaret C Steiner, Daniel P Rice, Arjun Biddanda, Mariadaria K Ianni-Ravn, Christian Porras, John Novembre","doi":"10.1101/2024.12.02.626424","DOIUrl":null,"url":null,"abstract":"<p><p>One key component of study design in population genetics is the \"geographic breadth\" of a sample (i.e., how broad a region across which individuals are sampled). How the geographic breadth of a sample impacts observations of rare, deleterious variants is unclear, even though such variants are of particular interest for biomedical and evolutionary applications. Here, in order to gain insight into the effects of sample design on ascertained genetic variants, we formulate a stochastic model of dispersal, genetic drift, selection, mutation, and geographically concentrated sampling. We use this model to understand the effects of the geographic breadth of sampling effort on the discovery of negatively selected variants. We find that samples which are more geographically broad will discover a greater number variants as compared geographically narrow samples (an effect we label \"discovery\"); though the variants will be detected at lower average frequency than in narrow samples (e.g. as singletons, an effect we label \"dilution\"). Importantly, these effects are amplified for larger sample sizes and moderated by the magnitude of fitness effects. We validate these results using both population genetic simulations and empirical analyses in the UK Biobank. Our results are particularly important in two contexts: the association of large-effect rare variants with particular phenotypes and the inference of negative selection from allele frequency data. Overall, our findings emphasize the importance of considering geographic breadth when designing and carrying out genetic studies, especially at biobank scale.</p>","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11642817/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.12.02.626424","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

One key component of study design in population genetics is the "geographic breadth" of a sample (i.e., how broad a region across which individuals are sampled). How the geographic breadth of a sample impacts observations of rare, deleterious variants is unclear, even though such variants are of particular interest for biomedical and evolutionary applications. Here, in order to gain insight into the effects of sample design on ascertained genetic variants, we formulate a stochastic model of dispersal, genetic drift, selection, mutation, and geographically concentrated sampling. We use this model to understand the effects of the geographic breadth of sampling effort on the discovery of negatively selected variants. We find that samples which are more geographically broad will discover a greater number variants as compared geographically narrow samples (an effect we label "discovery"); though the variants will be detected at lower average frequency than in narrow samples (e.g. as singletons, an effect we label "dilution"). Importantly, these effects are amplified for larger sample sizes and moderated by the magnitude of fitness effects. We validate these results using both population genetic simulations and empirical analyses in the UK Biobank. Our results are particularly important in two contexts: the association of large-effect rare variants with particular phenotypes and the inference of negative selection from allele frequency data. Overall, our findings emphasize the importance of considering geographic breadth when designing and carrying out genetic studies, especially at biobank scale.

研究设计和生物库规模数据集中有害罕见变异的抽样。
群体遗传学研究设计的一个关键要素是样本的 "地理广度"(即个体被采样的区域范围)。样本的地理广度如何影响对罕见、有害变异的观察尚不清楚,尽管这类变异在生物医学和进化应用中具有特殊意义。在此,为了深入了解样本设计对已确定遗传变异的影响,我们建立了一个关于散布、遗传漂移、选择、突变和地理集中取样的随机模型。我们利用这一模型来了解取样工作的地理广度对发现负向选择变异的影响。我们发现,与地理范围较窄的样本相比,地理范围较广的样本会发现更多的变异体(我们称之为 "发现 "效应);尽管这些变异体的平均检出频率会低于地理范围较窄的样本(如单体,我们称之为 "稀释 "效应)。重要的是,这些效应会随着样本量的增大而放大,并受到适应性效应大小的影响。我们利用英国生物库中的群体遗传模拟和经验分析验证了这些结果。我们的结果在两个方面尤为重要:大效应稀有变异与特定表型的关联以及从等位基因频率数据推断负选择。总体而言,我们的研究结果强调了在设计和开展基因研究时考虑地域广度的重要性,尤其是在生物库规模的研究中:随着遗传研究的发展,研究人员越来越多地寻求识别对性状有重大影响的罕见遗传变异。在本文中,我们结合理论方法和数据分析,展示了地理位置的采样差异如何影响发现的遗传变异的数量和频率。我们的研究结果表明,与地理范围较窄的样本相比,地理范围较广的样本会包含更多不同的遗传变异,尽管每个变异的发现频率较低。我们的结果有助于研究人员在构建新的基因样本时考虑研究设计对预期结果的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信