{"title":"Size-biased sensitivity analysis for matched pairs design to assess the impact of healthcare-associated infections","authors":"David Watson","doi":"10.1353/obs.2023.a906628","DOIUrl":"https://doi.org/10.1353/obs.2023.a906628","url":null,"abstract":"Abstract:Healthcare-associated infections are serious adverse events that occur during a hospital admission. Quantifying the impact of these infections on inpatient length of stay and cost has important policy implications due to the Hospital-Acquired Conditions Reduction Program in the United States. However, most studies on this topic are flawed because they do not account for when a healthcare-associated infection occurred during a hospital admission. Such an approach leads to selection bias because patients with longer hospital stays are more likely to experience an infection due to their increased exposure time. Time of infection is often not incorporated into the estimation strategy because this information is unknown, yet there are no methods that account for the selection bias in this scenario. To address this problem, we propose a sensitivity analysis for matched pairs designs for assessing the effect of healthcare-associated infections on length of stay and cost when time of infection is unknown. The approach models the probability of infection, or the assignment mechanism, as proportional to a power function of the uninfected length of stay, where the sensitivity parameter is the value of the power. The general idea is to incorporate the degree of exposure into the probability of an infection occurring. Under this size-biased assignment mechanism, we develop hypothesis tests under a sharp null hypothesis of constant multiplicative effects. The approach is demonstrated on a pediatric cohort of inpatient encounters and compared to benchmark estimates that properly account for time of infection. The results reaffirm the severe degree of bias when not accounting for time of infection and also show that the proposed sensitivity analysis captures the benchmark estimates for plausible and theoretically justified values of the sensitivity parameter.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42324694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luke Keele, Matthew Lenard, Luke Miratrix, Lindsay Page
{"title":"A Software Tutorial for Matching in Clustered Observational Studies","authors":"Luke Keele, Matthew Lenard, Luke Miratrix, Lindsay Page","doi":"10.1353/obs.2023.a906624","DOIUrl":"https://doi.org/10.1353/obs.2023.a906624","url":null,"abstract":"Abstract:Many interventions occur in settings where treatments are applied to groups. For example, a math intervention may be implemented for all students in some schools and withheld from students in other schools. When such treatments are non-randomly allocated, researchers can use statistical adjustment to make treated and control groups similar in terms of observed characteristics. Recent work in statistics has developed a form of matching, known as multilevel matching, that is designed for contexts where treatments are clustered. In this article, we provide a tutorial on how to analyze clustered treatment using multilevel matching. We use a real data application to explain the full set of steps for the analysis of a clustered observational study.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"73 - 96"},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45559753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Doubly Robust Estimation of Average Treatment Effects on the Treated through Marginal Structural Models","authors":"M. Schomaker, Philipp F. M. Baumann","doi":"10.1353/obs.2023.0025","DOIUrl":"https://doi.org/10.1353/obs.2023.0025","url":null,"abstract":"Abstract:Some causal parameters are defined on subgroups of the observed data, such as the average treatment effect on the treated and variations thereof. We explain how such parameters can be defined through parameters in a marginal structural (working) model. We illustrate how existing software can be used for doubly robust effect estimation of those parameters. Our proposal for confidence interval estimation is based on the delta method. All concepts are illustrated by estimands and data from the data challenge of the 2022 American Causal Inference Conference.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"43 - 57"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41487639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal Methods Madness: Lessons Learned from the 2022 ACIC Competition to Estimate Health Policy Impacts","authors":"Daniel Thal, M. Finucane","doi":"10.1353/obs.2023.0023","DOIUrl":"https://doi.org/10.1353/obs.2023.0023","url":null,"abstract":"Abstract:Introducing novel causal estimators usually involves simulation studies run by the statistician developing the estimator, but this traditional approach can be fraught: simulation design is often favorable to the new method, unfavorable results might never be published, and comparison across estimators is difficult. The American Causal Inference Conference (ACIC) data challenges offer an alternative. As organizers of the 2022 challenge, we generated thousands of data sets similar to real-world policy evaluations and baked in true causal impacts unknown to participants. Participating teams then competed on an even playing field, using their cutting-edge methods to estimate those effects. In total, 20 teams submitted results from 58 estimators that used a range of approaches. We found several important factors driving performance that are not commonly used in business-as-usual applied policy evaluations, pointing to ways future evaluations could achieve more precise and nuanced estimates of policy impacts. Top-performing methods used flexible modeling of outcome-covariate and outcome-participation relationships as well as regularization of subgroup estimates. Furthermore, we found that model-based uncertainty intervals tended to outperform bootstrap-based ones. Lastly, and counter to our expectations, we found that analyzing large-n patient-level data does not improve performance relative to analyzing smaller-n data aggregated to the primary care practice level, given that in our simulated data sets practices (not individual patients) decided whether to join the intervention. Ultimately, we hope this competition helped identify methods that are best suited for evaluating which social policies move the needle for the individuals and communities they serve.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"27 - 3"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44338192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Treatment Effect with Propensity Score Weighted Regression and Double Machine Learning","authors":"Jun Xue, Wei Zhong Goh, Dana Rotz","doi":"10.1353/obs.2023.0028","DOIUrl":"https://doi.org/10.1353/obs.2023.0028","url":null,"abstract":"Abstract:We applied propensity score weighted regression and double machine learning in the 2022 American Causal Inference Conference Data Challenge. Our double machine learning method achieved the second lowest overall RMSE among all official submissions, but performed less well on heterogeneous treatment effect estimation due to lack of regularization.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"10 6","pages":"83 - 90"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41291815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Treatment Effects over Time with Causal Forests: An application to the ACIC 2022 Data Challenge","authors":"Shu Wan, Guanghui Zhang","doi":"10.1353/obs.2023.0026","DOIUrl":"https://doi.org/10.1353/obs.2023.0026","url":null,"abstract":"Abstract:In this paper, we present our winning modeling approach, DiConfounder, for the Atlantic Causal Inference Conference (ACIC) 2022 Data Science data challenge. Our method ranks 1st in RMSE and 5th in coverage among the 58 submissions. We propose a transformed outcome estimator by connecting the difference-in-difference and conditional average treatment effect estimation problems. Our comprehensive multistage pipeline encompasses feature engineering, missing value imputation, outcome and propensity score modeling, treatment effects modeling, and SATT and uncertainty estimations. Our model achieves remarkably accurate predictions, with an overall RMSE as low as 11 and 84.5% coverage. Further discussions explore various methods for constructing confidence intervals and analyzing the limitations of our approach under different data generating process settings. We provide evidence that the clustered data structure is the key to success. We also release the source code on GitHub for practitioners to adopt and adapt our methods.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"59 - 71"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43810955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inverse Probability Weighting Difference-in-Differences (IPWDID)","authors":"Yuqin Wei, M. Epland, Jingyuan Liu","doi":"10.1353/obs.2023.0027","DOIUrl":"https://doi.org/10.1353/obs.2023.0027","url":null,"abstract":"Abstract:In this American Causal Inference Conference (ACIC) 2022 challenge submission, the canonical difference-in-differences (DID) estimator has been used with inverse probability weighting (IPW) and strong simplifying assumptions to produce a benchmark model of the sample average treatment effect on the treated (SATT). Despite the restrictive assumptions and simple model, satisfactory performance in both point estimate and confidence intervals was observed, ranking in the top half of the competition.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"73 - 81"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49451652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"lmtp: An R Package for Estimating the Causal Effects of Modified Treatment Policies","authors":"Nicholas T Williams, I. Díaz","doi":"10.1353/obs.2023.0019","DOIUrl":"https://doi.org/10.1353/obs.2023.0019","url":null,"abstract":"Abstract:We present the lmtp R package for causal inference from longitudinal observational or randomized studies. This package implements the estimators of Díaz et al. (2021) for estimating general non-parametric causal effects based on modified treatment policies. Modified treatment policies generalize static and dynamic interventions, making lmtp and all-purpose package for non-parametric causal inference in observational studies. The methods provided can be applied to both point-treatment and longitudinal settings, and can account for time-varying exposure, covariates, and right censoring thereby providing a very general tool for causal inference. Additionally, two of the provided estimators are based on flexible machine learning regression algorithms, and avoid bias due to parametric model misspecification while maintaining valid statistical inference.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"103 - 122"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47362691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Doubly-Robust Inference in R using drtmle","authors":"D. Benkeser, N. Hejazi","doi":"10.1353/obs.2023.0017","DOIUrl":"https://doi.org/10.1353/obs.2023.0017","url":null,"abstract":"Abstract:Inverse probability of treatment weighted estimators and doubly robust estimators (including augmented inverse probability of treatment weight and targeted minimum loss estimators) are widely used in causal inference to estimate and draw inference about the average effect of a treatment. As an intermediate step, these estimators require estimation of key nuisance parameters, which are often regression functions. Typically, regressions are estimated using maximum likelihood and parametric models. Confidence intervals and p-values may be computed based on standard asymptotic results, such as the central limit theorem, the delta method, and the nonparametric bootstrap. However, in high-dimensional settings, maximum likelihood estimation often breaks down and standard procedures no longer yield correct inference. Instead, we may rely on adaptive estimators of nuisance parameters to construct flexible regression estimators. However, use of adaptive estimators poses a challenge for performing statistical inference about an estimated treatment effect. While doubly robust estimators facilitate inference when all relevant regression functions are consistently estimated, the same cannot be said when at least one nuisance estimator is inconsistent. drtmle implements doubly robust confidence intervals and hypothesis tests for targeted minimum loss estimates of the average treatment effect, in addition to several other recently proposed estimators of the average treatment effect.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"43 - 78"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41508466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of dimension reduction methods for the identification of heart-healthy dietary patterns","authors":"Natalie C. Gasca, R. McClelland","doi":"10.1353/obs.2023.0020","DOIUrl":"https://doi.org/10.1353/obs.2023.0020","url":null,"abstract":"Abstract:Most nutritional epidemiology studies investigating diet-disease trends use unsupervised dimension reduction methods, like principal component regression (PCR) and sparse PCR (SPCR), to create dietary patterns. Supervised methods, such as partial least squares (PLS), sparse PLS (SPLS), and Lasso, offer the possibility of more concisely summarizing the foods most related to a disease. In this study we evaluate these five methods for interpretable reduction of food frequency questionnaire (FFQ) data when analyzing a univariate continuous cardiac-related outcome via a simulation study and data application. We also demonstrate that to control for covariates, various scientific premises require different adjustment approaches when using PLS. To emulate food groups, we generated blocks of normally distributed predictors with varying intra-block covariances; only nine of 24 predictors contributed to the normal response. When block covariances were informed by FFQ data, the only methods that performed variable selection were Lasso and SPLS, which selected two and four irrelevant variables, respectively. SPLS had the lowest prediction error, and both PLS-based methods constructed four patterns, while PCR and SPCR created 24 patterns. These methods were applied to 120 FFQ variables and baseline body mass index (BMI) from the Multi-Ethnic Study of Atherosclerosis, which includes 6814 participants aged 45-84, and we adjusted for age, gender, race/ethnicity, exercise, and total energy intake. From 120 variables, PCR created 17 BMI-related patterns and PLS selected one pattern; SPLS only used five variables to create two patterns. All methods exhibited similar predictive performance. Specifically, SPLS’s first pattern highlighted hamburger and diet soda intake (positive associations with BMI), reflecting a fast food diet. By selecting fewer patterns and foods, SPLS can create interpretable dietary patterns while maintaining predictive ability.","PeriodicalId":74335,"journal":{"name":"Observational studies","volume":"9 1","pages":"123 - 156"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49570747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}