A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies

IF 3.8 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology Pub Date : 2024-04-21 DOI:10.1002/gepi.22563

Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin

{"title":"A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies","authors":"Negar Janani, Kendra A. Young, Greg Kinney, Matthew Strand, John E. Hokanson, Yaning Liu, Troy Butler, Erin Austin","doi":"10.1002/gepi.22563","DOIUrl":null,"url":null,"abstract":"<p>The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 6","pages":"270-288"},"PeriodicalIF":3.8000,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22563","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission.

查看原文本刊更多论文

在全基因组关联研究中克服虚假推断的数据一致性反演新应用

全基因组关联研究（GWAS）通常使用线性或逻辑回归模型来确定相关表型（性状）与基因型（遗传变异）之间的关联。然而，使用加法假设回归有潜在的局限性。首先，残差的正态性假设在实践中很少见，而偏离正态性会增加 I 类错误率。其次，基于这种假设建立模型会忽略遗传结构，如显性、隐性和保护性风险情况。忽略遗传变异可能会导致关于变异与性状之间关联的错误结论。我们提出了一种建立在数据一致性反演（DCI）基础上的无假设模型，DCI 是最近开发的一种用于不确定性量化的计量理论框架。这个由 DCI 衍生的模型在模型输入上建立了一个非参数分布，该分布可传播到观测数据的分布，而无需对回归模型中的残差进行所需的正态性假设。这一特点使拟议的 DCI 衍生模型能够涵盖所有遗传变异，而无需强调经典 GWAS 模型的可加性。利用 COPDGene 数据进行的模拟和复制 GWAS 证明，该模型在控制 I 类错误率方面的能力至少与经典 GWAS（加法线性模型）方法相当，同时在发现不同遗传传播模式的变异方面具有相似或更强的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

9.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.