{"title":"DeLIVR:在全转录组关联研究中测试非线性因果效应的深度学习 IV 回归方法。","authors":"Ruoyu He, Mingyang Liu, Zhaotong Lin, Zhong Zhuang, Xiaotong Shen, Wei Pan","doi":"10.1093/biostatistics/kxac051","DOIUrl":null,"url":null,"abstract":"<p><p>Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":" ","pages":"468-485"},"PeriodicalIF":1.8000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017120/pdf/","citationCount":"0","resultStr":"{\"title\":\"DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.\",\"authors\":\"Ruoyu He, Mingyang Liu, Zhaotong Lin, Zhong Zhuang, Xiaotong Shen, Wei Pan\",\"doi\":\"10.1093/biostatistics/kxac051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.</p>\",\"PeriodicalId\":55357,\"journal\":{\"name\":\"Biostatistics\",\"volume\":\" \",\"pages\":\"468-485\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017120/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biostatistics/kxac051\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxac051","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
DeLIVR: a deep learning approach to IV regression for testing nonlinear causal effects in transcriptome-wide association studies.
Transcriptome-wide association studies (TWAS) have been increasingly applied to identify (putative) causal genes for complex traits and diseases. TWAS can be regarded as a two-sample two-stage least squares method for instrumental variable (IV) regression for causal inference. The standard TWAS (called TWAS-L) only considers a linear relationship between a gene's expression and a trait in stage 2, which may lose statistical power when not true. Recently, an extension of TWAS (called TWAS-LQ) considers both the linear and quadratic effects of a gene on a trait, which however is not flexible enough due to its parametric nature and may be low powered for nonquadratic nonlinear effects. On the other hand, a deep learning (DL) approach, called DeepIV, has been proposed to nonparametrically model a nonlinear effect in IV regression. However, it is both slow and unstable due to the ill-posed inverse problem of solving an integral equation with Monte Carlo approximations. Furthermore, in the original DeepIV approach, statistical inference, that is, hypothesis testing, was not studied. Here, we propose a novel DL approach, called DeLIVR, to overcome the major drawbacks of DeepIV, by estimating a related but different target function and including a hypothesis testing framework. We show through simulations that DeLIVR was both faster and more stable than DeepIV. We applied both parametric and DL approaches to the GTEx and UK Biobank data, showcasing that DeLIVR detected additional 8 and 7 genes nonlinearly associated with high-density lipoprotein (HDL) cholesterol and low-density lipoprotein (LDL) cholesterol, respectively, all of which would be missed by TWAS-L, TWAS-LQ, and DeepIV; these genes include BUD13 associated with HDL, SLC44A2 and GMIP with LDL, all supported by previous studies.
期刊介绍:
Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.