{"title":"Causal Methods Madness: Lessons Learned from the 2022 ACIC Competition to Estimate Health Policy Impacts","authors":"Daniel Thal, M. Finucane","doi":"10.1353/obs.2023.0023","DOIUrl":null,"url":null,"abstract":"Abstract:Introducing novel causal estimators usually involves simulation studies run by the statistician developing the estimator, but this traditional approach can be fraught: simulation design is often favorable to the new method, unfavorable results might never be published, and comparison across estimators is difficult. The American Causal Inference Conference (ACIC) data challenges offer an alternative. As organizers of the 2022 challenge, we generated thousands of data sets similar to real-world policy evaluations and baked in true causal impacts unknown to participants. Participating teams then competed on an even playing field, using their cutting-edge methods to estimate those effects. In total, 20 teams submitted results from 58 estimators that used a range of approaches. We found several important factors driving performance that are not commonly used in business-as-usual applied policy evaluations, pointing to ways future evaluations could achieve more precise and nuanced estimates of policy impacts. Top-performing methods used flexible modeling of outcome-covariate and outcome-participation relationships as well as regularization of subgroup estimates. Furthermore, we found that model-based uncertainty intervals tended to outperform bootstrap-based ones. Lastly, and counter to our expectations, we found that analyzing large-n patient-level data does not improve performance relative to analyzing smaller-n data aggregated to the primary care practice level, given that in our simulated data sets practices (not individual patients) decided whether to join the intervention. Ultimately, we hope this competition helped identify methods that are best suited for evaluating which social policies move the needle for the individuals and communities they serve.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Observational studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/obs.2023.0023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract:Introducing novel causal estimators usually involves simulation studies run by the statistician developing the estimator, but this traditional approach can be fraught: simulation design is often favorable to the new method, unfavorable results might never be published, and comparison across estimators is difficult. The American Causal Inference Conference (ACIC) data challenges offer an alternative. As organizers of the 2022 challenge, we generated thousands of data sets similar to real-world policy evaluations and baked in true causal impacts unknown to participants. Participating teams then competed on an even playing field, using their cutting-edge methods to estimate those effects. In total, 20 teams submitted results from 58 estimators that used a range of approaches. We found several important factors driving performance that are not commonly used in business-as-usual applied policy evaluations, pointing to ways future evaluations could achieve more precise and nuanced estimates of policy impacts. Top-performing methods used flexible modeling of outcome-covariate and outcome-participation relationships as well as regularization of subgroup estimates. Furthermore, we found that model-based uncertainty intervals tended to outperform bootstrap-based ones. Lastly, and counter to our expectations, we found that analyzing large-n patient-level data does not improve performance relative to analyzing smaller-n data aggregated to the primary care practice level, given that in our simulated data sets practices (not individual patients) decided whether to join the intervention. Ultimately, we hope this competition helped identify methods that are best suited for evaluating which social policies move the needle for the individuals and communities they serve.