{"title":"通过交叉拟合,加快基于 R2 的高维中介效应的区间估计。","authors":"Zhichao Xu, Chunlin Li, Sunyi Chi, Tianzhong Yang, Peng Wei","doi":"10.1093/biostatistics/kxae037","DOIUrl":null,"url":null,"abstract":"<p><p>Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting.\",\"authors\":\"Zhichao Xu, Chunlin Li, Sunyi Chi, Tianzhong Yang, Peng Wei\",\"doi\":\"10.1093/biostatistics/kxae037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.</p>\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biostatistics/kxae037\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biostatistics/kxae037","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
中介分析是研究基因表达等分子表型如何介导暴露对健康结果影响的有用工具。然而,常用的基于均值的总中介效应测量方法可能会在存在高维表观中介因子的情况下,出现分量-分量-分量的反向中介效应抵消的问题。为了克服这一局限性,我们最近提出了一种基于方差的 R 平方总中介效应测量方法,它依赖于计算密集型非参数自举法进行置信区间估计。在本文所述的工作中,我们为 R2 测量制定了更有效的两阶段交叉拟合估计程序。为了避免潜在的偏差,我们在两个子样本中进行了迭代确定独立性筛选(iSIS),以排除非调解人,然后用普通最小二乘法回归进行方差估计。然后,我们根据新推导出的 R2 测量的闭式渐近分布构建置信区间。广泛的模拟研究表明,与基于重采样的方法相比,我们提出的方法在计算上更有效率,而且覆盖概率相当。此外,当应用于弗雷明汉心脏研究时,所提出的方法复制了基因表达介导收缩压年龄相关变化的既定结论,并确定了基因表达谱在性别与高密度脂蛋白胆固醇水平之间关系中的作用。拟议的估计程序在 R 软件包 CFR2M 中实现。
Speeding up interval estimation for R2-based mediation effect of high-dimensional mediators via cross-fitting.
Mediation analysis is a useful tool in investigating how molecular phenotypes such as gene expression mediate the effect of exposure on health outcomes. However, commonly used mean-based total mediation effect measures may suffer from cancellation of component-wise mediation effects in opposite directions in the presence of high-dimensional omics mediators. To overcome this limitation, we recently proposed a variance-based R-squared total mediation effect measure that relies on the computationally intensive nonparametric bootstrap for confidence interval estimation. In the work described herein, we formulated a more efficient two-stage, cross-fitted estimation procedure for the R2 measure. To avoid potential bias, we performed iterative Sure Independence Screening (iSIS) in two subsamples to exclude the non-mediators, followed by ordinary least squares regressions for the variance estimation. We then constructed confidence intervals based on the newly derived closed-form asymptotic distribution of the R2 measure. Extensive simulation studies demonstrated that this proposed procedure is much more computationally efficient than the resampling-based method, with comparable coverage probability. Furthermore, when applied to the Framingham Heart Study, the proposed method replicated the established finding of gene expression mediating age-related variation in systolic blood pressure and identified the role of gene expression profiles in the relationship between sex and high-density lipoprotein cholesterol level. The proposed estimation procedure is implemented in R package CFR2M.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.