{"title":"Estimating Treatment Effects over Time with Causal Forests: An application to the ACIC 2022 Data Challenge","authors":"Shu Wan, Guanghui Zhang","doi":"10.1353/obs.2023.0026","DOIUrl":null,"url":null,"abstract":"Abstract:In this paper, we present our winning modeling approach, DiConfounder, for the Atlantic Causal Inference Conference (ACIC) 2022 Data Science data challenge. Our method ranks 1st in RMSE and 5th in coverage among the 58 submissions. We propose a transformed outcome estimator by connecting the difference-in-difference and conditional average treatment effect estimation problems. Our comprehensive multistage pipeline encompasses feature engineering, missing value imputation, outcome and propensity score modeling, treatment effects modeling, and SATT and uncertainty estimations. Our model achieves remarkably accurate predictions, with an overall RMSE as low as 11 and 84.5% coverage. Further discussions explore various methods for constructing confidence intervals and analyzing the limitations of our approach under different data generating process settings. We provide evidence that the clustered data structure is the key to success. We also release the source code on GitHub for practitioners to adopt and adapt our methods.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Observational studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/obs.2023.0026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract:In this paper, we present our winning modeling approach, DiConfounder, for the Atlantic Causal Inference Conference (ACIC) 2022 Data Science data challenge. Our method ranks 1st in RMSE and 5th in coverage among the 58 submissions. We propose a transformed outcome estimator by connecting the difference-in-difference and conditional average treatment effect estimation problems. Our comprehensive multistage pipeline encompasses feature engineering, missing value imputation, outcome and propensity score modeling, treatment effects modeling, and SATT and uncertainty estimations. Our model achieves remarkably accurate predictions, with an overall RMSE as low as 11 and 84.5% coverage. Further discussions explore various methods for constructing confidence intervals and analyzing the limitations of our approach under different data generating process settings. We provide evidence that the clustered data structure is the key to success. We also release the source code on GitHub for practitioners to adopt and adapt our methods.