Mediation with External Summary Statistic Information.

IF 2 3区 数学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee
{"title":"Mediation with External Summary Statistic Information.","authors":"Jonathan Boss, Wei Hao, Amber Cathey, Barrett M Welch, Kelly K Ferguson, John D Meeker, Xiang Zhou, Jian Kang, Bhramar Mukherjee","doi":"10.1093/biostatistics/kxaf020","DOIUrl":null,"url":null,"abstract":"<p><p>Environmental health studies are increasingly measuring endogenous omics data ($ \\boldsymbol{M} $) to study intermediary biological pathways by which an exogenous exposure ($ \\boldsymbol{A} $) affects a health outcome ($ \\boldsymbol{Y} $), given confounders ($ \\boldsymbol{C} $). Mediation analysis is frequently performed to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $ \\boldsymbol{A} $ on $ \\boldsymbol{Y} $ given $ \\boldsymbol{C} $. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect can improve estimation efficiency of the direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $ R^{2} $ between the outcome ($ \\boldsymbol{Y}\\mid\\boldsymbol{M},\\boldsymbol{A},\\boldsymbol{C} $) and total effect ($ \\boldsymbol{Y}\\mid\\boldsymbol{A},\\boldsymbol{C} $) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We propose a robust data-adaptive estimation procedure, Mediation with External Summary Statistic Information, to improve estimation efficiency in settings with congenial external information, while simultaneously protecting against bias in settings with incongenial external information. In congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External summary information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.</p>","PeriodicalId":55357,"journal":{"name":"Biostatistics","volume":"26 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12302958/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxaf020","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Environmental health studies are increasingly measuring endogenous omics data ($ \boldsymbol{M} $) to study intermediary biological pathways by which an exogenous exposure ($ \boldsymbol{A} $) affects a health outcome ($ \boldsymbol{Y} $), given confounders ($ \boldsymbol{C} $). Mediation analysis is frequently performed to understand such mechanisms. If intermediary pathways are of interest, then there is likely literature establishing statistical and biological significance of the total effect, defined as the effect of $ \boldsymbol{A} $ on $ \boldsymbol{Y} $ given $ \boldsymbol{C} $. For mediation models with continuous outcomes and mediators, we show that leveraging external summary-level information on the total effect can improve estimation efficiency of the direct and indirect effects. Moreover, the efficiency gain depends on the asymptotic partial $ R^{2} $ between the outcome ($ \boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C} $) and total effect ($ \boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C} $) models, with smaller (larger) values benefiting direct (indirect) effect estimation. We propose a robust data-adaptive estimation procedure, Mediation with External Summary Statistic Information, to improve estimation efficiency in settings with congenial external information, while simultaneously protecting against bias in settings with incongenial external information. In congenial simulation scenarios, we observe relative efficiency gains for mediation effect estimation of up to 40%. We illustrate our methodology using data from the Puerto Rico Testsite for Exploring Contamination Threats, where Cytochrome p450 metabolites are hypothesized to mediate the effect of phthalate exposure on gestational age at delivery. External summary information on the total effect comes from a recently published pooled analysis of 16 studies. The proposed framework blends mediation analysis with emerging data integration techniques.

带有外部汇总统计信息的中介。
环境健康研究越来越多地测量内源性组学数据($ \boldsymbol{M} $),以研究外源性暴露($ \boldsymbol{A} $)在给定混杂因素($ \boldsymbol{C} $)的情况下影响健康结果($ \boldsymbol{Y} $)的中间生物学途径。经常执行中介分析来理解此类机制。如果对中间途径感兴趣,那么可能有文献建立了总效应的统计和生物学显著性,定义为给定$ \boldsymbol{C} $, $ \boldsymbol{A} $对$ \boldsymbol{Y} $的影响。对于具有连续结果和中介的中介模型,我们表明利用总效应的外部摘要级信息可以提高直接和间接效应的估计效率。此外,效率增益取决于结果($ \boldsymbol{Y}\mid\boldsymbol{M},\boldsymbol{A},\boldsymbol{C} $)和总效果($ \boldsymbol{Y}\mid\boldsymbol{A},\boldsymbol{C} $)模型之间的渐近偏R^{2} $,较小(较大)的值有利于直接(间接)效果估计。我们提出了一种鲁棒的数据自适应估计方法,即外部汇总统计信息的中介,以提高在外部信息一致的情况下的估计效率,同时防止外部信息不一致的情况下的偏差。在相似的模拟场景中,我们观察到中介效应估计的相对效率增益高达40%。我们使用波多黎各污染威胁探索试验场的数据来说明我们的方法,其中细胞色素p450代谢物被假设为介导邻苯二甲酸盐暴露对分娩时胎龄的影响。关于总效应的外部总结信息来自最近发表的对16项研究的汇总分析。提出的框架将中介分析与新兴的数据集成技术相结合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biostatistics
Biostatistics 生物-数学与计算生物学
CiteScore
5.10
自引率
4.80%
发文量
45
审稿时长
6-12 weeks
期刊介绍: Among the important scientific developments of the 20th century is the explosive growth in statistical reasoning and methods for application to studies of human health. Examples include developments in likelihood methods for inference, epidemiologic statistics, clinical trials, survival analysis, and statistical genetics. Substantive problems in public health and biomedical research have fueled the development of statistical methods, which in turn have improved our ability to draw valid inferences from data. The objective of Biostatistics is to advance statistical science and its application to problems of human health and disease, with the ultimate goal of advancing the public''s health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信