在全基因组关联研究中克服虚假推断的数据一致性反演新应用

IF 1.7 4区 医学 Q3 GENETICS & HEREDITY
Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin
{"title":"在全基因组关联研究中克服虚假推断的数据一致性反演新应用","authors":"Negar Janani,&nbsp;Kendra A. Young,&nbsp;Greg Kinney,&nbsp;Matthew Strand,&nbsp;John E. Hokanson,&nbsp;Yaning Liu,&nbsp;Troy Butler,&nbsp;Erin Austin","doi":"10.1002/gepi.22563","DOIUrl":null,"url":null,"abstract":"<p>The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"270-288"},"PeriodicalIF":1.7000,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies\",\"authors\":\"Negar Janani,&nbsp;Kendra A. Young,&nbsp;Greg Kinney,&nbsp;Matthew Strand,&nbsp;John E. Hokanson,&nbsp;Yaning Liu,&nbsp;Troy Butler,&nbsp;Erin Austin\",\"doi\":\"10.1002/gepi.22563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.</p>\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"48 6\",\"pages\":\"270-288\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22563\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22563","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

全基因组关联研究(GWAS)通常使用线性或逻辑回归模型来确定相关表型(性状)与基因型(遗传变异)之间的关联。然而,使用加法假设回归有潜在的局限性。首先,残差的正态性假设在实践中很少见,而偏离正态性会增加 I 类错误率。其次,基于这种假设建立模型会忽略遗传结构,如显性、隐性和保护性风险情况。忽略遗传变异可能会导致关于变异与性状之间关联的错误结论。我们提出了一种建立在数据一致性反演(DCI)基础上的无假设模型,DCI 是最近开发的一种用于不确定性量化的计量理论框架。这个由 DCI 衍生的模型在模型输入上建立了一个非参数分布,该分布可传播到观测数据的分布,而无需对回归模型中的残差进行所需的正态性假设。这一特点使拟议的 DCI 衍生模型能够涵盖所有遗传变异,而无需强调经典 GWAS 模型的可加性。利用 COPDGene 数据进行的模拟和复制 GWAS 证明,该模型在控制 I 类错误率方面的能力至少与经典 GWAS(加法线性模型)方法相当,同时在发现不同遗传传播模式的变异方面具有相似或更强的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies

The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genetic Epidemiology
Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
4.40
自引率
9.50%
发文量
49
审稿时长
6-12 weeks
期刊介绍: Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信