Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee
{"title":"带有外部汇总统计信息的中介。","authors":"Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee","doi":"10.1093/biostatistics/kxaf020","DOIUrl":null,"url":null,"abstract":"<p><p>Environmental health studies are increasingly measuring endogenous omics data ($ \\boldsymbol{M} $) to study intermediary biological pathways by which an exogenous exposure ($ \\boldsymbol{A} $) affects a health outcome ($ \\boldsymbol{Y} $), given confounders ($ \\boldsymbol{C} $). Mediation analysis is frequently performed to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $ \\boldsymbol{A} $ on $ \\boldsymbol{Y} $ given $ \\boldsymbol{C} $. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect can improve estimation efficiency of the direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $ R^{2} $ between the outcome ($ \\boldsymbol{Y}\\mid\\boldsymbol{M},\\boldsymbol{A},\\boldsymbol{C} $) and total effect ($ \\boldsymbol{Y}\\mid\\boldsymbol{A},\\boldsymbol{C} $) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We propose a robust data-adaptive estimation procedure, Mediation with External Summary Statistic Information, to improve estimation efficiency in settings with congenial external information, while simultaneously protecting against bias in settings with incongenial external information. In congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External summary information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12302958/pdf/","citationCount":"0","resultStr":"{\"title\":\"Mediation with External Summary Statistic Information.\",\"authors\":\"Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee\",\"doi\":\"10.1093/biostatistics/kxaf020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Environmental health studies are increasingly measuring endogenous omics data ($ \\\\boldsymbol{M} $) to study intermediary biological pathways by which an exogenous exposure ($ \\\\boldsymbol{A} $) affects a health outcome ($ \\\\boldsymbol{Y} $), given confounders ($ \\\\boldsymbol{C} $). Mediation analysis is frequently performed to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $ \\\\boldsymbol{A} $ on $ \\\\boldsymbol{Y} $ given $ \\\\boldsymbol{C} $. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect can improve estimation efficiency of the direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $ R^{2} $ between the outcome ($ \\\\boldsymbol{Y}\\\\mid\\\\boldsymbol{M},\\\\boldsymbol{A},\\\\boldsymbol{C} $) and total effect ($ \\\\boldsymbol{Y}\\\\mid\\\\boldsymbol{A},\\\\boldsymbol{C} $) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We propose a robust data-adaptive estimation procedure, Mediation with External Summary Statistic Information, to improve estimation efficiency in settings with congenial external information, while simultaneously protecting against bias in settings with incongenial external information. In congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External summary information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.</p>\",\"PeriodicalId\":55357,\"journal\":{\"name\":\"Biostatistics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12302958/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biostatistics/kxaf020\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxaf020","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Mediation with External Summary Statistic Information.
Environmental health studies are increasingly measuring endogenous omics data ($ \boldsymbol{M} $) to study intermediary biological pathways by which an exogenous exposure ($ \boldsymbol{A} $) affects a health outcome ($ \boldsymbol{Y} $), given confounders ($ \boldsymbol{C} $). Mediation analysis is frequently performed to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $ \boldsymbol{A} $ on $ \boldsymbol{Y} $ given $ \boldsymbol{C} $. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect can improve estimation efficiency of the direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $ R^{2} $ between the outcome ($ \boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C} $) and total effect ($ \boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C} $) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We propose a robust data-adaptive estimation procedure, Mediation with External Summary Statistic Information, to improve estimation efficiency in settings with congenial external information, while simultaneously protecting against bias in settings with incongenial external information. In congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External summary information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.
期刊介绍:
Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.