{"title":"INTEGRATING INCOMPLETE DATA FOR MEDIATION ANALYSIS.","authors":"Andriy Derkach, Joshua N Sampson, Ruth M Pfeiffer","doi":"10.5705/ss.202021.0373","DOIUrl":null,"url":null,"abstract":"<p><p>Mediation analysis examines the relationships between an exposure, a mediator, and an outcome. Although many approaches are available for performing such analyses they all require access to a single complete data set that contains the three key variables: outcome, exposure, and mediator. Here, we propose semiparametric methods for mediation analysis to estimate the standard causal parameters (direct and indirect effects) by combining information from several incomplete data sets, each containing only two of the three key variables. Importantly, our methods also handle scenarios in which only summary statistics based on those data sets are available. The resulting estimates of the causal parameters are asymptotically unbiased and normally distributed. We evaluate the performance of our methods in finite samples using simulations, and quantify the loss in efficiency from the lack of a complete data set with all three variables. We then apply proposed method to determine whether the number of terminal duct lobular units in the breast mediate the relationship between a polygenic risk score and breast cancer risk.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"1 1","pages":"1045-1066"},"PeriodicalIF":1.2000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13048772/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Sinica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.5705/ss.202021.0373","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Mediation analysis examines the relationships between an exposure, a mediator, and an outcome. Although many approaches are available for performing such analyses they all require access to a single complete data set that contains the three key variables: outcome, exposure, and mediator. Here, we propose semiparametric methods for mediation analysis to estimate the standard causal parameters (direct and indirect effects) by combining information from several incomplete data sets, each containing only two of the three key variables. Importantly, our methods also handle scenarios in which only summary statistics based on those data sets are available. The resulting estimates of the causal parameters are asymptotically unbiased and normally distributed. We evaluate the performance of our methods in finite samples using simulations, and quantify the loss in efficiency from the lack of a complete data set with all three variables. We then apply proposed method to determine whether the number of terminal duct lobular units in the breast mediate the relationship between a polygenic risk score and breast cancer risk.
期刊介绍:
Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.