在质量改进和卫生系统研究中应用差异设计。

IF 4.3 2区 医学 Q1 GERIATRICS & GERONTOLOGY
Yucheng Hou PhD, MPP, Abdelaziz Alsharawy PhD
{"title":"在质量改进和卫生系统研究中应用差异设计。","authors":"Yucheng Hou PhD, MPP,&nbsp;Abdelaziz Alsharawy PhD","doi":"10.1111/jgs.19180","DOIUrl":null,"url":null,"abstract":"<p>Assessing the effectiveness of a health system intervention when randomized controlled trials (RCTs) are infeasible has long been a challenge for clinicians, health economists, and health service researchers alike. Difference-in-differences (DID) is a quasi-experimental study design that can be particularly appealing in addressing this challenge using observational data. Other nonexperimental study designs, such as regression adjustment or propensity score matching, attempt to examine the impact of an intervention by only accounting for the observed differences between groups. In contrast, an appropriately designed DID study aims to exploit randomness in intervention timing to identify the causal effects of the intervention. The number of published papers applying DID designs in the medical field has been increasing in recent years.<span><sup>1, 2</sup></span> Following this trend, the <i>Journal of the American Geriatrics Society</i> (JAGS) published 18 studies, mostly since 2018 (original data from the authors), that apply DID designs to examine a wide range of health system interventions that pertain to geriatrics care (Figure 1).</p><p>DID designs assess the effect of an intervention (e.g., health policy or program) applied to one or more groups (treated) by comparing their outcomes relative to a group that has never or not yet received the intervention (control) in terms of two differences.<span><sup>3, 4</sup></span> The first set of differences compares outcomes before and after the timing of the intervention for the treated and control groups, respectively. This process removes the observed and unobserved group-specific factors that do not change over time. Subtracting these differences (i.e., the second difference or difference-in-differences) removes the time-varying trends that are common to both groups. Together, DID identifies the causal effect of the intervention assuming that the treated would have experienced the same trend as the control group in the absence of the intervention (parallel trends).</p><p>In the recent issue of JAGS, a study by Burke and colleagues<span><sup>5</sup></span> used a DID design to evaluate changes in patient care outcomes following the Age-Friendly health systems recognition in the Veterans Health Administration. The authors incorporated recent advances in DID with staggered treatment timing developed by Sun and Abraham,<span><sup>6</sup></span> which is appropriate as the receipt of recognition across the medical sites happened at different times. This approach addresses potential biases in traditional DID estimation—often referred to as two-way fixed effects—when treatment effects are not constant over time and differ by late versus early treated sites.<span><sup>6-8</sup></span> Given the absence of an RCT in this setting, one of the notable strengths of this study stems from using observational data to measure the effect of recognition for implementing evidence-based care transformations (4Ms: what Matters, Medication, Mentation, and Mobility) in geriatric care outcomes. While the findings clearly describe a positive association between Age-Friendly recognition and facility-free days, the readers are met with a typical conundrum when DID designs are adopted: Can we interpret these relationships as causal effects?</p><p>Answering this question requires making an explicit argument for the plausibility of core DID assumptions—both statistically and conceptually. Typically, studies using DID devote substantial attention to argue for the validity of the parallel trend assumption by demonstrating similar trends in outcomes between treated and control prior to the intervention (hereinafter, pre-trend tests). If a healthcare outcome prior to the intervention (e.g., number of facility-free days) was already increasing (or decreasing) for the treated relative to the control, then observed differences post the intervention may merely be the continuation of the pre-trend—not the treatment effect. Although most publications applying DID in JAGS discussed or visually assessed parallel pre-trends, only a handful reported statistical tests (Figure 1). Visual inspections, however intuitive, may mask differential trends leading up to the intervention or may be too noisy to provide a compelling critique of the pre-trend. Statistical tests that are adequately powered to detect differences in pre-trends between treated and control groups would be more transparent; recent literature has been focusing on diagnostics of power of pre-trend testing.<span><sup>9-11</sup></span> If, however, nonparallel pre-trends are evident, adjusting for or matching on time-invariant observed characteristics measured at baseline that are associated with treatment status and the outcome trends may be justified.<span><sup>12</sup></span></p><p>Nonetheless, even if the parallel trend appears to be satisfied prior to the intervention, does this criterion warrant that a DID design is readily geared to identify the causal impact of an intervention? Not necessarily. First, we want to make a distinction between parallel trend assumption and pre-trend tests.<span><sup>10</sup></span> This distinction is important to highlight because the parallel trend assumption involves a counterfactual concept that is inherently untestable: What would have happened had the treated group not received the treatment? Pre-trend tests, if passed, do lend credibility to a DID design. Yet, a more critical question emerges when assessing the overall validity of the parallel trend assumption: What time-varying <i>unobserved</i> confounding factors may have resulted in or coincide with the intervention taking place? This question fundamentally pertains to the nature and timing of the intervention. Indeed, this core aspect of the DID assumption is more subtle and is prone to fail in many practical applications.<span><sup>1</sup></span> Beyond pre-trend tests, conceptual discussions with context-specific examinations are necessary for establishing the rationale for causal inference using DID designs.<span><sup>10</sup></span> We next focus on three potential sources of bias that are commonly discussed in recent DID literature and can be particularly relevant in health systems research (Figure 2).</p><p>DID designs assume that the effect of an intervention begins only after it has been implemented. Health system interventions, however, often involve intrinsic or extrinsic incentives that can be anticipated by the treated group prior to the intervention. In particular, healthcare accreditations (e.g., age-friendly recognition) can be sought in response to changes in practices that are already in place and precede receiving such recognitions. The treatment effect in these settings can be biased because of changes leading up to the intervention (rather than intervention onset). Conceptual arguments for absence of anticipation can describe context-specific challenges for participants or organizations to predict and influence future outcomes. Empirical falsification tests can be applied to examine the extent of anticipation such as shifting the timing of the intervention to a hypothetical period prior to the actual intervention onset.<span><sup>13</sup></span> The anticipation from pre-periods can also be removed from the main treatment effect by including an additional interaction term between treated groups and indicators for a washout period leading up to the intervention.<span><sup>14</sup></span></p><p>Time-varying unobserved factors may influence the program entry and exit (e.g., selective participation or dropouts) or lead to differential behaviors for the treated relative to control. In healthcare settings, such selection can occur at multiple levels. Many health system interventions are voluntary in nature. At the organization level, high-performing organizations may voluntarily select into the program due to a motivation to improve performance or an expected financial return, which could lead to upward biased estimates of the effectiveness. On the other hand, organizations that underperform at baseline could also be selected to participate in the program, which may result in an observed short-term improvement that is likely driven by regression toward the mean. At the clinician level, patient selection can occur as clinicians may have private knowledge about their patient risks that influence care outcomes. For studies that use disease diagnosis as the intervention, hidden biological mechanisms or unobserved comorbidities can often be the underlying driving force behind the effects following the diagnosis. The conceptual validity of the DID design will be improved if accompanied by a precise discussion on how the treated are selected to receive an intervention while demonstrating equivalent patient compositions among treated and control groups before and after the timing of the intervention. Statistical falsification exercises on populations or services not intended for treatment may also help rule out unobserved changes occurring in a medical site around the same time of the intervention (e.g., focusing on non-elderly Veterans for Age-Friendly recognition).</p><p>Intervention spillover across treated and control groups is another factor that can compromise the validity of DID designs. The control group in studies assessing quality improvement may observe and learn from the treated group and change their behavior accordingly, underestimating the total effect attributed to the intervention. A more subtle form of spillover effects can also occur when the control group implements some, if not all, elements of the intervention while not being categorized as treated (e.g., applying aspects of geriatric care transformations without seeking a recognition status). Statistical evidence of geographic distance or physical separation limiting learning between groups or conceptual arguments of unlikely spillover channels can be helpful to bolster the credibility of the parallel trend assumption.</p><p>Causal inference is not binary—moving from nonexperimental designs that can only account for observed characteristics, to DID and other quasi-experimental designs that instead exploit a source of randomness and may account for unobserved differences, and eventually to RCTs that deliberately eliminate both observed and unobserved confounders (but can sometimes be infeasible to conduct). When appropriate, using quasi-experimental designs to approach questions in health systems research provides strong and actionable evidence. The level of credibility in causal inference designs, however, rests on the plausibility of the underlying assumptions, which need to be explicitly outlined, statistically assessed when possible, and contextualized given the specific intervention or setting. This is especially important as many medical journals that used to reserve the use of causal language to RCT frameworks are now moving toward facilitating the introduction of causal claims in appropriately designed observational studies.<span><sup>15, 16</sup></span> Advancements in DID are rapidly evolving and paving the way for more credible statistical estimation of intervention effects.<span><sup>11</sup></span> Yet, presenting conceptual arguments that motivate the use of DID designs in health systems research is perhaps even more important to enhance the quality of evaluations. Clinicians and healthcare professionals are uniquely positioned within health systems to spot contexts that can deliver promising DID designs for causal inference.</p><p>All authors contributed equally to the manuscript, including conceptualization, drafting, editing, and final approval.</p><p>The authors report no conflicts of interest to disclose.</p><p>None.</p>","PeriodicalId":17240,"journal":{"name":"Journal of the American Geriatrics Society","volume":"73 1","pages":"8-11"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jgs.19180","citationCount":"0","resultStr":"{\"title\":\"Applying difference-in-differences design in quality improvement and health systems research\",\"authors\":\"Yucheng Hou PhD, MPP,&nbsp;Abdelaziz Alsharawy PhD\",\"doi\":\"10.1111/jgs.19180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Assessing the effectiveness of a health system intervention when randomized controlled trials (RCTs) are infeasible has long been a challenge for clinicians, health economists, and health service researchers alike. Difference-in-differences (DID) is a quasi-experimental study design that can be particularly appealing in addressing this challenge using observational data. Other nonexperimental study designs, such as regression adjustment or propensity score matching, attempt to examine the impact of an intervention by only accounting for the observed differences between groups. In contrast, an appropriately designed DID study aims to exploit randomness in intervention timing to identify the causal effects of the intervention. The number of published papers applying DID designs in the medical field has been increasing in recent years.<span><sup>1, 2</sup></span> Following this trend, the <i>Journal of the American Geriatrics Society</i> (JAGS) published 18 studies, mostly since 2018 (original data from the authors), that apply DID designs to examine a wide range of health system interventions that pertain to geriatrics care (Figure 1).</p><p>DID designs assess the effect of an intervention (e.g., health policy or program) applied to one or more groups (treated) by comparing their outcomes relative to a group that has never or not yet received the intervention (control) in terms of two differences.<span><sup>3, 4</sup></span> The first set of differences compares outcomes before and after the timing of the intervention for the treated and control groups, respectively. This process removes the observed and unobserved group-specific factors that do not change over time. Subtracting these differences (i.e., the second difference or difference-in-differences) removes the time-varying trends that are common to both groups. Together, DID identifies the causal effect of the intervention assuming that the treated would have experienced the same trend as the control group in the absence of the intervention (parallel trends).</p><p>In the recent issue of JAGS, a study by Burke and colleagues<span><sup>5</sup></span> used a DID design to evaluate changes in patient care outcomes following the Age-Friendly health systems recognition in the Veterans Health Administration. The authors incorporated recent advances in DID with staggered treatment timing developed by Sun and Abraham,<span><sup>6</sup></span> which is appropriate as the receipt of recognition across the medical sites happened at different times. This approach addresses potential biases in traditional DID estimation—often referred to as two-way fixed effects—when treatment effects are not constant over time and differ by late versus early treated sites.<span><sup>6-8</sup></span> Given the absence of an RCT in this setting, one of the notable strengths of this study stems from using observational data to measure the effect of recognition for implementing evidence-based care transformations (4Ms: what Matters, Medication, Mentation, and Mobility) in geriatric care outcomes. While the findings clearly describe a positive association between Age-Friendly recognition and facility-free days, the readers are met with a typical conundrum when DID designs are adopted: Can we interpret these relationships as causal effects?</p><p>Answering this question requires making an explicit argument for the plausibility of core DID assumptions—both statistically and conceptually. Typically, studies using DID devote substantial attention to argue for the validity of the parallel trend assumption by demonstrating similar trends in outcomes between treated and control prior to the intervention (hereinafter, pre-trend tests). If a healthcare outcome prior to the intervention (e.g., number of facility-free days) was already increasing (or decreasing) for the treated relative to the control, then observed differences post the intervention may merely be the continuation of the pre-trend—not the treatment effect. Although most publications applying DID in JAGS discussed or visually assessed parallel pre-trends, only a handful reported statistical tests (Figure 1). Visual inspections, however intuitive, may mask differential trends leading up to the intervention or may be too noisy to provide a compelling critique of the pre-trend. Statistical tests that are adequately powered to detect differences in pre-trends between treated and control groups would be more transparent; recent literature has been focusing on diagnostics of power of pre-trend testing.<span><sup>9-11</sup></span> If, however, nonparallel pre-trends are evident, adjusting for or matching on time-invariant observed characteristics measured at baseline that are associated with treatment status and the outcome trends may be justified.<span><sup>12</sup></span></p><p>Nonetheless, even if the parallel trend appears to be satisfied prior to the intervention, does this criterion warrant that a DID design is readily geared to identify the causal impact of an intervention? Not necessarily. First, we want to make a distinction between parallel trend assumption and pre-trend tests.<span><sup>10</sup></span> This distinction is important to highlight because the parallel trend assumption involves a counterfactual concept that is inherently untestable: What would have happened had the treated group not received the treatment? Pre-trend tests, if passed, do lend credibility to a DID design. Yet, a more critical question emerges when assessing the overall validity of the parallel trend assumption: What time-varying <i>unobserved</i> confounding factors may have resulted in or coincide with the intervention taking place? This question fundamentally pertains to the nature and timing of the intervention. Indeed, this core aspect of the DID assumption is more subtle and is prone to fail in many practical applications.<span><sup>1</sup></span> Beyond pre-trend tests, conceptual discussions with context-specific examinations are necessary for establishing the rationale for causal inference using DID designs.<span><sup>10</sup></span> We next focus on three potential sources of bias that are commonly discussed in recent DID literature and can be particularly relevant in health systems research (Figure 2).</p><p>DID designs assume that the effect of an intervention begins only after it has been implemented. Health system interventions, however, often involve intrinsic or extrinsic incentives that can be anticipated by the treated group prior to the intervention. In particular, healthcare accreditations (e.g., age-friendly recognition) can be sought in response to changes in practices that are already in place and precede receiving such recognitions. The treatment effect in these settings can be biased because of changes leading up to the intervention (rather than intervention onset). Conceptual arguments for absence of anticipation can describe context-specific challenges for participants or organizations to predict and influence future outcomes. Empirical falsification tests can be applied to examine the extent of anticipation such as shifting the timing of the intervention to a hypothetical period prior to the actual intervention onset.<span><sup>13</sup></span> The anticipation from pre-periods can also be removed from the main treatment effect by including an additional interaction term between treated groups and indicators for a washout period leading up to the intervention.<span><sup>14</sup></span></p><p>Time-varying unobserved factors may influence the program entry and exit (e.g., selective participation or dropouts) or lead to differential behaviors for the treated relative to control. In healthcare settings, such selection can occur at multiple levels. Many health system interventions are voluntary in nature. At the organization level, high-performing organizations may voluntarily select into the program due to a motivation to improve performance or an expected financial return, which could lead to upward biased estimates of the effectiveness. On the other hand, organizations that underperform at baseline could also be selected to participate in the program, which may result in an observed short-term improvement that is likely driven by regression toward the mean. At the clinician level, patient selection can occur as clinicians may have private knowledge about their patient risks that influence care outcomes. For studies that use disease diagnosis as the intervention, hidden biological mechanisms or unobserved comorbidities can often be the underlying driving force behind the effects following the diagnosis. The conceptual validity of the DID design will be improved if accompanied by a precise discussion on how the treated are selected to receive an intervention while demonstrating equivalent patient compositions among treated and control groups before and after the timing of the intervention. Statistical falsification exercises on populations or services not intended for treatment may also help rule out unobserved changes occurring in a medical site around the same time of the intervention (e.g., focusing on non-elderly Veterans for Age-Friendly recognition).</p><p>Intervention spillover across treated and control groups is another factor that can compromise the validity of DID designs. The control group in studies assessing quality improvement may observe and learn from the treated group and change their behavior accordingly, underestimating the total effect attributed to the intervention. A more subtle form of spillover effects can also occur when the control group implements some, if not all, elements of the intervention while not being categorized as treated (e.g., applying aspects of geriatric care transformations without seeking a recognition status). Statistical evidence of geographic distance or physical separation limiting learning between groups or conceptual arguments of unlikely spillover channels can be helpful to bolster the credibility of the parallel trend assumption.</p><p>Causal inference is not binary—moving from nonexperimental designs that can only account for observed characteristics, to DID and other quasi-experimental designs that instead exploit a source of randomness and may account for unobserved differences, and eventually to RCTs that deliberately eliminate both observed and unobserved confounders (but can sometimes be infeasible to conduct). When appropriate, using quasi-experimental designs to approach questions in health systems research provides strong and actionable evidence. The level of credibility in causal inference designs, however, rests on the plausibility of the underlying assumptions, which need to be explicitly outlined, statistically assessed when possible, and contextualized given the specific intervention or setting. This is especially important as many medical journals that used to reserve the use of causal language to RCT frameworks are now moving toward facilitating the introduction of causal claims in appropriately designed observational studies.<span><sup>15, 16</sup></span> Advancements in DID are rapidly evolving and paving the way for more credible statistical estimation of intervention effects.<span><sup>11</sup></span> Yet, presenting conceptual arguments that motivate the use of DID designs in health systems research is perhaps even more important to enhance the quality of evaluations. Clinicians and healthcare professionals are uniquely positioned within health systems to spot contexts that can deliver promising DID designs for causal inference.</p><p>All authors contributed equally to the manuscript, including conceptualization, drafting, editing, and final approval.</p><p>The authors report no conflicts of interest to disclose.</p><p>None.</p>\",\"PeriodicalId\":17240,\"journal\":{\"name\":\"Journal of the American Geriatrics Society\",\"volume\":\"73 1\",\"pages\":\"8-11\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jgs.19180\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Geriatrics Society\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jgs.19180\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GERIATRICS & GERONTOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Geriatrics Society","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jgs.19180","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GERIATRICS & GERONTOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

本文章由计算机程序翻译,如有差异,请以英文原文为准。

Applying difference-in-differences design in quality improvement and health systems research

Applying difference-in-differences design in quality improvement and health systems research

Assessing the effectiveness of a health system intervention when randomized controlled trials (RCTs) are infeasible has long been a challenge for clinicians, health economists, and health service researchers alike. Difference-in-differences (DID) is a quasi-experimental study design that can be particularly appealing in addressing this challenge using observational data. Other nonexperimental study designs, such as regression adjustment or propensity score matching, attempt to examine the impact of an intervention by only accounting for the observed differences between groups. In contrast, an appropriately designed DID study aims to exploit randomness in intervention timing to identify the causal effects of the intervention. The number of published papers applying DID designs in the medical field has been increasing in recent years.1, 2 Following this trend, the Journal of the American Geriatrics Society (JAGS) published 18 studies, mostly since 2018 (original data from the authors), that apply DID designs to examine a wide range of health system interventions that pertain to geriatrics care (Figure 1).

DID designs assess the effect of an intervention (e.g., health policy or program) applied to one or more groups (treated) by comparing their outcomes relative to a group that has never or not yet received the intervention (control) in terms of two differences.3, 4 The first set of differences compares outcomes before and after the timing of the intervention for the treated and control groups, respectively. This process removes the observed and unobserved group-specific factors that do not change over time. Subtracting these differences (i.e., the second difference or difference-in-differences) removes the time-varying trends that are common to both groups. Together, DID identifies the causal effect of the intervention assuming that the treated would have experienced the same trend as the control group in the absence of the intervention (parallel trends).

In the recent issue of JAGS, a study by Burke and colleagues5 used a DID design to evaluate changes in patient care outcomes following the Age-Friendly health systems recognition in the Veterans Health Administration. The authors incorporated recent advances in DID with staggered treatment timing developed by Sun and Abraham,6 which is appropriate as the receipt of recognition across the medical sites happened at different times. This approach addresses potential biases in traditional DID estimation—often referred to as two-way fixed effects—when treatment effects are not constant over time and differ by late versus early treated sites.6-8 Given the absence of an RCT in this setting, one of the notable strengths of this study stems from using observational data to measure the effect of recognition for implementing evidence-based care transformations (4Ms: what Matters, Medication, Mentation, and Mobility) in geriatric care outcomes. While the findings clearly describe a positive association between Age-Friendly recognition and facility-free days, the readers are met with a typical conundrum when DID designs are adopted: Can we interpret these relationships as causal effects?

Answering this question requires making an explicit argument for the plausibility of core DID assumptions—both statistically and conceptually. Typically, studies using DID devote substantial attention to argue for the validity of the parallel trend assumption by demonstrating similar trends in outcomes between treated and control prior to the intervention (hereinafter, pre-trend tests). If a healthcare outcome prior to the intervention (e.g., number of facility-free days) was already increasing (or decreasing) for the treated relative to the control, then observed differences post the intervention may merely be the continuation of the pre-trend—not the treatment effect. Although most publications applying DID in JAGS discussed or visually assessed parallel pre-trends, only a handful reported statistical tests (Figure 1). Visual inspections, however intuitive, may mask differential trends leading up to the intervention or may be too noisy to provide a compelling critique of the pre-trend. Statistical tests that are adequately powered to detect differences in pre-trends between treated and control groups would be more transparent; recent literature has been focusing on diagnostics of power of pre-trend testing.9-11 If, however, nonparallel pre-trends are evident, adjusting for or matching on time-invariant observed characteristics measured at baseline that are associated with treatment status and the outcome trends may be justified.12

Nonetheless, even if the parallel trend appears to be satisfied prior to the intervention, does this criterion warrant that a DID design is readily geared to identify the causal impact of an intervention? Not necessarily. First, we want to make a distinction between parallel trend assumption and pre-trend tests.10 This distinction is important to highlight because the parallel trend assumption involves a counterfactual concept that is inherently untestable: What would have happened had the treated group not received the treatment? Pre-trend tests, if passed, do lend credibility to a DID design. Yet, a more critical question emerges when assessing the overall validity of the parallel trend assumption: What time-varying unobserved confounding factors may have resulted in or coincide with the intervention taking place? This question fundamentally pertains to the nature and timing of the intervention. Indeed, this core aspect of the DID assumption is more subtle and is prone to fail in many practical applications.1 Beyond pre-trend tests, conceptual discussions with context-specific examinations are necessary for establishing the rationale for causal inference using DID designs.10 We next focus on three potential sources of bias that are commonly discussed in recent DID literature and can be particularly relevant in health systems research (Figure 2).

DID designs assume that the effect of an intervention begins only after it has been implemented. Health system interventions, however, often involve intrinsic or extrinsic incentives that can be anticipated by the treated group prior to the intervention. In particular, healthcare accreditations (e.g., age-friendly recognition) can be sought in response to changes in practices that are already in place and precede receiving such recognitions. The treatment effect in these settings can be biased because of changes leading up to the intervention (rather than intervention onset). Conceptual arguments for absence of anticipation can describe context-specific challenges for participants or organizations to predict and influence future outcomes. Empirical falsification tests can be applied to examine the extent of anticipation such as shifting the timing of the intervention to a hypothetical period prior to the actual intervention onset.13 The anticipation from pre-periods can also be removed from the main treatment effect by including an additional interaction term between treated groups and indicators for a washout period leading up to the intervention.14

Time-varying unobserved factors may influence the program entry and exit (e.g., selective participation or dropouts) or lead to differential behaviors for the treated relative to control. In healthcare settings, such selection can occur at multiple levels. Many health system interventions are voluntary in nature. At the organization level, high-performing organizations may voluntarily select into the program due to a motivation to improve performance or an expected financial return, which could lead to upward biased estimates of the effectiveness. On the other hand, organizations that underperform at baseline could also be selected to participate in the program, which may result in an observed short-term improvement that is likely driven by regression toward the mean. At the clinician level, patient selection can occur as clinicians may have private knowledge about their patient risks that influence care outcomes. For studies that use disease diagnosis as the intervention, hidden biological mechanisms or unobserved comorbidities can often be the underlying driving force behind the effects following the diagnosis. The conceptual validity of the DID design will be improved if accompanied by a precise discussion on how the treated are selected to receive an intervention while demonstrating equivalent patient compositions among treated and control groups before and after the timing of the intervention. Statistical falsification exercises on populations or services not intended for treatment may also help rule out unobserved changes occurring in a medical site around the same time of the intervention (e.g., focusing on non-elderly Veterans for Age-Friendly recognition).

Intervention spillover across treated and control groups is another factor that can compromise the validity of DID designs. The control group in studies assessing quality improvement may observe and learn from the treated group and change their behavior accordingly, underestimating the total effect attributed to the intervention. A more subtle form of spillover effects can also occur when the control group implements some, if not all, elements of the intervention while not being categorized as treated (e.g., applying aspects of geriatric care transformations without seeking a recognition status). Statistical evidence of geographic distance or physical separation limiting learning between groups or conceptual arguments of unlikely spillover channels can be helpful to bolster the credibility of the parallel trend assumption.

Causal inference is not binary—moving from nonexperimental designs that can only account for observed characteristics, to DID and other quasi-experimental designs that instead exploit a source of randomness and may account for unobserved differences, and eventually to RCTs that deliberately eliminate both observed and unobserved confounders (but can sometimes be infeasible to conduct). When appropriate, using quasi-experimental designs to approach questions in health systems research provides strong and actionable evidence. The level of credibility in causal inference designs, however, rests on the plausibility of the underlying assumptions, which need to be explicitly outlined, statistically assessed when possible, and contextualized given the specific intervention or setting. This is especially important as many medical journals that used to reserve the use of causal language to RCT frameworks are now moving toward facilitating the introduction of causal claims in appropriately designed observational studies.15, 16 Advancements in DID are rapidly evolving and paving the way for more credible statistical estimation of intervention effects.11 Yet, presenting conceptual arguments that motivate the use of DID designs in health systems research is perhaps even more important to enhance the quality of evaluations. Clinicians and healthcare professionals are uniquely positioned within health systems to spot contexts that can deliver promising DID designs for causal inference.

All authors contributed equally to the manuscript, including conceptualization, drafting, editing, and final approval.

The authors report no conflicts of interest to disclose.

None.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.00
自引率
6.30%
发文量
504
审稿时长
3-6 weeks
期刊介绍: Journal of the American Geriatrics Society (JAGS) is the go-to journal for clinical aging research. We provide a diverse, interprofessional community of healthcare professionals with the latest insights on geriatrics education, clinical practice, and public policy—all supporting the high-quality, person-centered care essential to our well-being as we age. Since the publication of our first edition in 1953, JAGS has remained one of the oldest and most impactful journals dedicated exclusively to gerontology and geriatrics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信