交叉验证目标最大似然估计的性能

Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez
{"title":"交叉验证目标最大似然估计的性能","authors":"Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez","doi":"arxiv-2409.11265","DOIUrl":null,"url":null,"abstract":"Background: Advanced methods for causal inference, such as targeted maximum\nlikelihood estimation (TMLE), require certain conditions for statistical\ninference. However, in situations where there is not differentiability due to\ndata sparsity or near-positivity violations, the Donsker class condition is\nviolated. In such situations, TMLE variance can suffer from inflation of the\ntype I error and poor coverage, leading to conservative confidence intervals.\nCross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve\non performance compared to TMLE in settings of positivity or Donsker class\nviolations. We aim to investigate the performance of CVTMLE compared to TMLE in\nvarious settings. Methods: We utilised the data-generating mechanism as described in Leger et\nal. (2022) to run a Monte Carlo experiment under different Donsker class\nviolations. Then, we evaluated the respective statistical performances of TMLE\nand CVTMLE with different super learner libraries, with and without regression\ntree methods. Results: We found that CVTMLE vastly improves confidence interval coverage\nwithout adversely affecting bias, particularly in settings with small sample\nsizes and near-positivity violations. Furthermore, incorporating regression\ntrees using standard TMLE with ensemble super learner-based initial estimates\nincreases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class\ncondition is no longer necessary to obtain valid statistical inference when\nusing regression trees and under either data sparsity or near-positivity\nviolations. We show through simulations that CVTMLE is much less sensitive to\nthe choice of the super learner library and thereby provides better estimation\nand inference in cases where the super learner library uses more flexible\ncandidates and is prone to overfitting.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of Cross-Validated Targeted Maximum Likelihood Estimation\",\"authors\":\"Matthew J. Smith, Rachael V. Phillips, Camille Maringe, Miguel Angel Luque Fernandez\",\"doi\":\"arxiv-2409.11265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Advanced methods for causal inference, such as targeted maximum\\nlikelihood estimation (TMLE), require certain conditions for statistical\\ninference. However, in situations where there is not differentiability due to\\ndata sparsity or near-positivity violations, the Donsker class condition is\\nviolated. In such situations, TMLE variance can suffer from inflation of the\\ntype I error and poor coverage, leading to conservative confidence intervals.\\nCross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve\\non performance compared to TMLE in settings of positivity or Donsker class\\nviolations. We aim to investigate the performance of CVTMLE compared to TMLE in\\nvarious settings. Methods: We utilised the data-generating mechanism as described in Leger et\\nal. (2022) to run a Monte Carlo experiment under different Donsker class\\nviolations. Then, we evaluated the respective statistical performances of TMLE\\nand CVTMLE with different super learner libraries, with and without regression\\ntree methods. Results: We found that CVTMLE vastly improves confidence interval coverage\\nwithout adversely affecting bias, particularly in settings with small sample\\nsizes and near-positivity violations. Furthermore, incorporating regression\\ntrees using standard TMLE with ensemble super learner-based initial estimates\\nincreases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class\\ncondition is no longer necessary to obtain valid statistical inference when\\nusing regression trees and under either data sparsity or near-positivity\\nviolations. We show through simulations that CVTMLE is much less sensitive to\\nthe choice of the super learner library and thereby provides better estimation\\nand inference in cases where the super learner library uses more flexible\\ncandidates and is prone to overfitting.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:先进的因果推断方法,如目标最大似然估计(TMLE),需要一定的统计推断条件。然而,在由于数据稀疏性或近正违反而不存在可分性的情况下,Donsker 类条件就会被违反。在这种情况下,TMLE 方差可能会出现 I 类误差膨胀和覆盖率低的问题,从而导致保守的置信区间。有人建议对 TMLE 算法(CVTMLE)进行交叉验证,以改善在正向性或违反 Donsker 类条件的情况下 TMLE 的性能。我们的目的是研究 CVTMLE 与 TMLE 相比在各种情况下的性能。方法:我们利用 Leger etal.(2022)中所述的数据生成机制,在不同的 Donsker 类别暴力下运行蒙特卡罗实验。然后,我们评估了 TMLE 和 CVTMLE 与不同超级学习器库、回归树方法和非回归树方法各自的统计性能。结果:我们发现,CVTMLE 极大地提高了置信区间的覆盖率,而不会对偏差产生不利影响,尤其是在样本量较小且接近正向违规的情况下。此外,使用标准 TMLE 结合基于集合超级学习器的初始估计的回归树会增加偏差和方差,导致无效的统计推断。结论研究表明,使用 CVTMLE 时,在数据稀疏性或接近正向违反情况下,使用回归树时不再需要 Donsker 类条件来获得有效的统计推断。我们通过仿真表明,CVTMLE 对超级学习库的选择不那么敏感,因此在超级学习库使用更灵活的候选者和容易过度拟合的情况下,CVTMLE 可以提供更好的估计和推断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance of Cross-Validated Targeted Maximum Likelihood Estimation
Background: Advanced methods for causal inference, such as targeted maximum likelihood estimation (TMLE), require certain conditions for statistical inference. However, in situations where there is not differentiability due to data sparsity or near-positivity violations, the Donsker class condition is violated. In such situations, TMLE variance can suffer from inflation of the type I error and poor coverage, leading to conservative confidence intervals. Cross-validation of the TMLE algorithm (CVTMLE) has been suggested to improve on performance compared to TMLE in settings of positivity or Donsker class violations. We aim to investigate the performance of CVTMLE compared to TMLE in various settings. Methods: We utilised the data-generating mechanism as described in Leger et al. (2022) to run a Monte Carlo experiment under different Donsker class violations. Then, we evaluated the respective statistical performances of TMLE and CVTMLE with different super learner libraries, with and without regression tree methods. Results: We found that CVTMLE vastly improves confidence interval coverage without adversely affecting bias, particularly in settings with small sample sizes and near-positivity violations. Furthermore, incorporating regression trees using standard TMLE with ensemble super learner-based initial estimates increases bias and variance leading to invalid statistical inference. Conclusions: It has been shown that when using CVTMLE the Donsker class condition is no longer necessary to obtain valid statistical inference when using regression trees and under either data sparsity or near-positivity violations. We show through simulations that CVTMLE is much less sensitive to the choice of the super learner library and thereby provides better estimation and inference in cases where the super learner library uses more flexible candidates and is prone to overfitting.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信