Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data

IF 1.7 4区医学 Q3 GENETICS & HEREDITY

Genetic Epidemiology Pub Date : 2024-06-17 DOI:10.1002/gepi.22577

Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti

{"title":"Hierarchical joint analysis of marginal summary statistics—Part II: High-dimensional instrumental analysis of omics data","authors":"Lai Jiang, Jiayi Shen, Burcu F. Darst, Christopher A. Haiman, Nicholas Mancuso, David V. Conti","doi":"10.1002/gepi.22577","DOIUrl":null,"url":null,"abstract":"<p>Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 7","pages":"291-309"},"PeriodicalIF":1.7000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22577","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22577","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Instrumental variable (IV) analysis has been widely applied in epidemiology to infer causal relationships using observational data. Genetic variants can also be viewed as valid IVs in Mendelian randomization and transcriptome-wide association studies. However, most multivariate IV approaches cannot scale to high-throughput experimental data. Here, we leverage the flexibility of our previous work, a hierarchical model that jointly analyzes marginal summary statistics (hJAM), to a scalable framework (SHA-JAM) that can be applied to a large number of intermediates and a large number of correlated genetic variants—situations often encountered in modern experiments leveraging omic technologies. SHA-JAM aims to estimate the conditional effect for high-dimensional risk factors on an outcome by incorporating estimates from association analyses of single-nucleotide polymorphism (SNP)-intermediate or SNP-gene expression as prior information in a hierarchical model. Results from extensive simulation studies demonstrate that SHA-JAM yields a higher area under the receiver operating characteristics curve (AUC), a lower mean-squared error of the estimates, and a much faster computation speed, compared to an existing approach for similar analyses. In two applied examples for prostate cancer, we investigated metabolite and transcriptome associations, respectively, using summary statistics from a GWAS for prostate cancer with more than 140,000 men and high dimensional publicly available summary data for metabolites and transcriptomes.

Abstract Image

查看原文本刊更多论文

边际汇总统计的层次联合分析--第二部分：omics 数据的高维工具分析。

工具变量（IV）分析已广泛应用于流行病学，利用观察数据推断因果关系。在孟德尔随机化和全转录组关联研究中，遗传变异也可被视为有效的工具变量。然而，大多数多变量 IV 方法无法扩展到高通量实验数据。在这里，我们利用之前工作的灵活性--联合分析边际汇总统计量的分层模型（hJAM）--建立了一个可扩展的框架（SHA-JAM），该框架可应用于大量中间产物和大量相关遗传变异--这是在利用 omic 技术的现代实验中经常遇到的情况。SHA-JAM旨在通过将单核苷酸多态性（SNP）-中间体或SNP-基因表达关联分析的估计值作为分层模型中的先验信息，估计高维风险因素对结果的条件效应。大量模拟研究结果表明，与现有的类似分析方法相比，SHA-JAM 的接收者操作特征曲线下面积（AUC）更大，估计值的均方误差更小，计算速度更快。在前列腺癌的两个应用实例中，我们使用来自超过 140,000 名男性前列腺癌 GWAS 的汇总统计数据以及代谢物和转录组的高维公开汇总数据，分别研究了代谢物和转录组之间的关联。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genetic Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

4.40

自引率

9.50%

发文量

审稿时长

6-12 weeks

期刊介绍： Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations. Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.