To compare the performance of difference-in-differences estimators fit to data aggregated to different time scales.
In simulations, we generated monthly observations for 50–100 units over 6 years from both a parametric model and a resampling simulation. The simulation scenarios varied panel balance, treatment timing, and true treatment effects. Our target parameters were static and dynamic average effects of treatment on the treated (ATT) estimated via linear regression (for common timing scenarios) and Callaway and Sant'Anna (2021) estimators (for staggered timing scenarios). We compared estimates from monthly, quarterly, and yearly data using bias, standard error, root mean squared error (RMSE), power, and Type I error. We also conducted a case study to illustrate the real-world impacts of these decisions.
We used data from a study of police retraining for the resampling simulations and case study. These data included counts of use-of-force incidents and dates of training enrollment for 8614 officers each month from 2011 to 2016.
Results from the simulation varied across performance metrics, estimation methods, target estimands, and data structures. In general, the choice of time aggregation was more consequential when estimating dynamic (versus static) treatment effects, in unbalanced (versus balanced) panel data, and in the resampling simulations (where data had less autocorrelation). Although time aggregation mattered little in many scenarios, coarser aggregation was preferable in resampling simulations of staggered timing scenarios. The re-analysis of police training data was sensitive to time aggregation.
In many scenarios, time aggregation has little impact on difference-in-differences estimators. However, when estimating dynamic effects, especially in staggered timing settings and unbalanced data, we found a tradeoff between precision and power, with finer aggregations being more powerful but less precise. In addition, estimators that use a single reference time point are more sensitive to noise in data measured at finer time scales.