Jiacong Du, Xiang Zhou, Dylan Clark-Boucher, Wei Hao, Yongmei Liu, Jennifer A. Smith, Bhramar Mukherjee
{"title":"大规模单一中介假设检验的方法:可能的选择和比较","authors":"Jiacong Du, Xiang Zhou, Dylan Clark-Boucher, Wei Hao, Yongmei Liu, Jennifer A. Smith, Bhramar Mukherjee","doi":"10.1002/gepi.22510","DOIUrl":null,"url":null,"abstract":"<p>Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, <math>\n <semantics>\n <mrow>\n <msub>\n <mi>H</mi>\n \n <mn>0</mn>\n </msub>\n \n <mo>:</mo>\n \n <mi>α</mi>\n \n <mi>β</mi>\n \n <mo>=</mo>\n \n <mn>0</mn>\n </mrow>\n <annotation> ${H}_{0}:\\alpha \\beta =0$</annotation>\n </semantics></math> (<math>\n <semantics>\n <mrow>\n <mi>α</mi>\n </mrow>\n <annotation> $\\alpha $</annotation>\n </semantics></math>: effect of the exposure on the mediator after adjusting for confounders; <math>\n <semantics>\n <mrow>\n <mi>β</mi>\n </mrow>\n <annotation> $\\beta $</annotation>\n </semantics></math>: effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product <math>\n <semantics>\n <mrow>\n <mi>α</mi>\n \n <mi>β</mi>\n </mrow>\n <annotation> $\\alpha \\beta $</annotation>\n </semantics></math> has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) <math>\n <semantics>\n <mrow>\n <mi>α</mi>\n \n <mo>=</mo>\n \n <mn>0</mn>\n \n <mo>,</mo>\n \n <mi>β</mi>\n \n <mo>≠</mo>\n \n <mn>0</mn>\n </mrow>\n <annotation> $\\alpha =0,\\beta \\ne 0$</annotation>\n </semantics></math>; (2) <math>\n <semantics>\n <mrow>\n <mi>α</mi>\n \n <mo>≠</mo>\n \n <mn>0</mn>\n \n <mo>,</mo>\n \n <mi>β</mi>\n \n <mo>=</mo>\n \n <mn>0</mn>\n </mrow>\n <annotation> $\\alpha \\ne 0,\\beta =0$</annotation>\n </semantics></math>; and (3) <math>\n <semantics>\n <mrow>\n <mi>α</mi>\n \n <mo>=</mo>\n \n <mi>β</mi>\n \n <mo>=</mo>\n \n <mn>0</mn>\n </mrow>\n <annotation> $\\alpha =\\beta =0$</annotation>\n </semantics></math>. The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three <i>p</i> values obtained under each case of the null so that the reference distribution of the composite statistic is approximately <math>\n <semantics>\n <mrow>\n <mi>U</mi>\n \n <mrow>\n <mo>(</mo>\n \n <mrow>\n <mn>0</mn>\n \n <mo>,</mo>\n \n <mn>1</mn>\n </mrow>\n \n <mo>)</mo>\n </mrow>\n </mrow>\n <annotation> $U(0,1)$</annotation>\n </semantics></math>. In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package <i>medScan</i> available on the CRAN for implementing all the six methods.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"47 2","pages":"167-184"},"PeriodicalIF":1.7000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22510","citationCount":"2","resultStr":"{\"title\":\"Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons\",\"authors\":\"Jiacong Du, Xiang Zhou, Dylan Clark-Boucher, Wei Hao, Yongmei Liu, Jennifer A. Smith, Bhramar Mukherjee\",\"doi\":\"10.1002/gepi.22510\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, <math>\\n <semantics>\\n <mrow>\\n <msub>\\n <mi>H</mi>\\n \\n <mn>0</mn>\\n </msub>\\n \\n <mo>:</mo>\\n \\n <mi>α</mi>\\n \\n <mi>β</mi>\\n \\n <mo>=</mo>\\n \\n <mn>0</mn>\\n </mrow>\\n <annotation> ${H}_{0}:\\\\alpha \\\\beta =0$</annotation>\\n </semantics></math> (<math>\\n <semantics>\\n <mrow>\\n <mi>α</mi>\\n </mrow>\\n <annotation> $\\\\alpha $</annotation>\\n </semantics></math>: effect of the exposure on the mediator after adjusting for confounders; <math>\\n <semantics>\\n <mrow>\\n <mi>β</mi>\\n </mrow>\\n <annotation> $\\\\beta $</annotation>\\n </semantics></math>: effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product <math>\\n <semantics>\\n <mrow>\\n <mi>α</mi>\\n \\n <mi>β</mi>\\n </mrow>\\n <annotation> $\\\\alpha \\\\beta $</annotation>\\n </semantics></math> has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) <math>\\n <semantics>\\n <mrow>\\n <mi>α</mi>\\n \\n <mo>=</mo>\\n \\n <mn>0</mn>\\n \\n <mo>,</mo>\\n \\n <mi>β</mi>\\n \\n <mo>≠</mo>\\n \\n <mn>0</mn>\\n </mrow>\\n <annotation> $\\\\alpha =0,\\\\beta \\\\ne 0$</annotation>\\n </semantics></math>; (2) <math>\\n <semantics>\\n <mrow>\\n <mi>α</mi>\\n \\n <mo>≠</mo>\\n \\n <mn>0</mn>\\n \\n <mo>,</mo>\\n \\n <mi>β</mi>\\n \\n <mo>=</mo>\\n \\n <mn>0</mn>\\n </mrow>\\n <annotation> $\\\\alpha \\\\ne 0,\\\\beta =0$</annotation>\\n </semantics></math>; and (3) <math>\\n <semantics>\\n <mrow>\\n <mi>α</mi>\\n \\n <mo>=</mo>\\n \\n <mi>β</mi>\\n \\n <mo>=</mo>\\n \\n <mn>0</mn>\\n </mrow>\\n <annotation> $\\\\alpha =\\\\beta =0$</annotation>\\n </semantics></math>. The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three <i>p</i> values obtained under each case of the null so that the reference distribution of the composite statistic is approximately <math>\\n <semantics>\\n <mrow>\\n <mi>U</mi>\\n \\n <mrow>\\n <mo>(</mo>\\n \\n <mrow>\\n <mn>0</mn>\\n \\n <mo>,</mo>\\n \\n <mn>1</mn>\\n </mrow>\\n \\n <mo>)</mo>\\n </mrow>\\n </mrow>\\n <annotation> $U(0,1)$</annotation>\\n </semantics></math>. In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package <i>medScan</i> available on the CRAN for implementing all the six methods.</p>\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"47 2\",\"pages\":\"167-184\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gepi.22510\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22510\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22510","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons
Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, (: effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large-scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure-mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) ; (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the three p values obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel-comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi-Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R package medScan available on the CRAN for implementing all the six methods.
期刊介绍:
Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations.
Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.