{"title":"Section Editor’s Note: Insights into the Generalizability of Findings from Experimental Evaluations","authors":"Laura R. Peck","doi":"10.1177/10982140221075092","DOIUrl":null,"url":null,"abstract":"As noted in my Editor’s Note to the Experimental Methodology Section of the American Journal of Evaluation’s (2020) Volume 40, Issue 4, experimental evaluations—where research units, such as people, schools, classrooms, and neighborhoods are randomly assigned to a program or to a control group—are often criticized for having limited external validity. In evaluation parlance, external validity refers to the ability to generalize results to other people, places, contexts, or times beyond those on which the evaluation focused. Evaluations—whether using an experimental design or not—are commonly conducted in a single site or a selected set of sites, either because that site is of particular interest or for convenience. Those special circumstances can mean that those sites—or the people within them—are not representative of a broader population of interest. In turn, the evaluation results may be useful only for assessing those people and places and not for predicting how a similar intervention might generate similar results for other people in other places. The good news, however, is that research and design innovations over the past several years have focused on how to overcome this criticism, making experimental evaluations’ results more useful for informing policy and program decisions (e.g., Bell & Stuart, 2016; Tipton & Olsen, 2018). Efforts for improving the external validity of experiments fall into two camps: design and analysis. Improving external validity through design means explicitly engaging a sample that is representative of a clearly identified target population. Although doing so is not common, particularly at the national level, some experiments have been successful at engaging a representative set of sites. The U.S. Department of Labor’s National Job Corps Study (e.g., Schochet, Burghardt & McConnell, 2006), the U.S. Department of Health and Human Services’ Head Start Impact Study (Puma et al., 2010), and the U.S. Social Security Administration’s Benefit Offset National Evaluation (Gubits et al., 2018) are three major evaluations that successfully recruited a nationally representative sample so that the evaluation results would be nationally generalizable. A simple, random selection of sites is the most straightforward way to ensure this representativeness and the generalizability of an evaluation’s results. In practice, however, that can be anything but simple. Even if an evaluation team randomly samples a site to participate, that site still needs to agree to participate; and if it does not, then the sample is no longer random.","PeriodicalId":51449,"journal":{"name":"American Journal of Evaluation","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Evaluation","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/10982140221075092","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
As noted in my Editor’s Note to the Experimental Methodology Section of the American Journal of Evaluation’s (2020) Volume 40, Issue 4, experimental evaluations—where research units, such as people, schools, classrooms, and neighborhoods are randomly assigned to a program or to a control group—are often criticized for having limited external validity. In evaluation parlance, external validity refers to the ability to generalize results to other people, places, contexts, or times beyond those on which the evaluation focused. Evaluations—whether using an experimental design or not—are commonly conducted in a single site or a selected set of sites, either because that site is of particular interest or for convenience. Those special circumstances can mean that those sites—or the people within them—are not representative of a broader population of interest. In turn, the evaluation results may be useful only for assessing those people and places and not for predicting how a similar intervention might generate similar results for other people in other places. The good news, however, is that research and design innovations over the past several years have focused on how to overcome this criticism, making experimental evaluations’ results more useful for informing policy and program decisions (e.g., Bell & Stuart, 2016; Tipton & Olsen, 2018). Efforts for improving the external validity of experiments fall into two camps: design and analysis. Improving external validity through design means explicitly engaging a sample that is representative of a clearly identified target population. Although doing so is not common, particularly at the national level, some experiments have been successful at engaging a representative set of sites. The U.S. Department of Labor’s National Job Corps Study (e.g., Schochet, Burghardt & McConnell, 2006), the U.S. Department of Health and Human Services’ Head Start Impact Study (Puma et al., 2010), and the U.S. Social Security Administration’s Benefit Offset National Evaluation (Gubits et al., 2018) are three major evaluations that successfully recruited a nationally representative sample so that the evaluation results would be nationally generalizable. A simple, random selection of sites is the most straightforward way to ensure this representativeness and the generalizability of an evaluation’s results. In practice, however, that can be anything but simple. Even if an evaluation team randomly samples a site to participate, that site still needs to agree to participate; and if it does not, then the sample is no longer random.
期刊介绍:
The American Journal of Evaluation (AJE) publishes original papers about the methods, theory, practice, and findings of evaluation. The general goal of AJE is to present the best work in and about evaluation, in order to improve the knowledge base and practice of its readers. Because the field of evaluation is diverse, with different intellectual traditions, approaches to practice, and domains of application, the papers published in AJE will reflect this diversity. Nevertheless, preference is given to papers that are likely to be of interest to a wide range of evaluators and that are written to be accessible to most readers.