DeLIVR：在全转录组关联研究中测试非线性因果效应的深度学习 IV 回归方法。

IF 1.8 3区数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biostatistics Pub Date : 2024-04-15 DOI:10.1093/biostatistics/kxac051

Ruoyu He, Mingyang Liu, Zhaotong Lin, Zhong Zhuang, Xiaotong Shen, Wei Pan

{"title":"DeLIVR：在全转录组关联研究中测试非线性因果效应的深度学习 IV 回归方法。","authors":"Ruoyu He, Mingyang Liu, Zhaotong Lin, Zhong Zhuang, Xiaotong Shen, Wei Pan","doi":"10.1093/biostatistics/kxac051","DOIUrl":null,"url":null,"abstract":"Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"468-485"},"PeriodicalIF":1.8000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017120/pdf/","citationCount":"0","resultStr":"{\"title\":\"DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.\",\"authors\":\"Ruoyu He, Mingyang Liu, Zhaotong Lin, Zhong Zhuang, Xiaotong Shen, Wei Pan\",\"doi\":\"10.1093/biostatistics/kxac051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.\",\"PeriodicalId\":55357,\"journal\":{\"name\":\"Biostatistics\",\"volume\":\" \",\"pages\":\"468-485\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017120/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biostatistics/kxac051\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxac051","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

全转录组关联研究（TWAS）越来越多地被用于识别复杂性状和疾病的（推定）因果基因。TWAS 可被视为用于因果推断的工具变量（IV）回归的两抽样两阶段最小二乘法。标准的 TWAS（称为 TWAS-L）在第二阶段只考虑基因表达与性状之间的线性关系，当这种关系不真实时，可能会失去统计能力。最近，TWAS 的一种扩展（称为 TWAS-LQ）同时考虑了基因对性状的线性和二次方效应，但由于其参数化的性质，这种方法不够灵活，而且对于非二次方的非线性效应的统计能力可能较低。另一方面，有人提出了一种名为 DeepIV 的深度学习（DL）方法，用于对 IV 回归中的非线性效应进行非参数建模。然而，由于需要用蒙特卡罗近似法求解积分方程的逆问题，该方法既慢又不稳定。此外，在最初的 DeepIV 方法中，没有研究统计推断，即假设检验。在这里，我们提出了一种名为 DeLIVR 的新型 DL 方法，通过估计一个相关但不同的目标函数并包含一个假设检验框架，克服了 DeepIV 的主要缺点。我们通过仿真表明，DeLIVR 比 DeepIV 更快、更稳定。我们在 GTEx 和英国生物库数据中应用了参数法和 DL 法，结果表明 DeLIVR 分别检测出了额外的 8 个和 7 个与高密度脂蛋白胆固醇和低密度脂蛋白胆固醇非线性相关的基因，而 TWAS-L、TWAS-LQ 和 DeepIV 会漏掉所有这些基因；这些基因包括与高密度脂蛋白相关的 BUD13、与低密度脂蛋白相关的 SLC44A2 和 GMIP，所有这些都得到了以前研究的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.

Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biostatistics 生物-数学与计算生物学

CiteScore

5.10

自引率

4.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.