Dev K. Dalal, Jason G. Randall, Ho Kwan Cheung, Brandon Gorman, Sylvia G. Roch, K. Williams
{"title":"Is there bias in alternatives to standardized tests? An investigation into letters of recommendation","authors":"Dev K. Dalal, Jason G. Randall, Ho Kwan Cheung, Brandon Gorman, Sylvia G. Roch, K. Williams","doi":"10.1080/15305058.2021.2019751","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019751","url":null,"abstract":"Abstract Individuals concerned with subgroup differences on standardized tests suggest replacing these tests with holistic evaluations of unstructured application materials, such as letters of recommendation (LORs), which they posit show less bias. We empirically investigate this proposition that LORs are bias-free, and argue that LORs might actually invite systematic, race and gender subgroup differences in the content and evaluation of LORs. We text analyzed over 37,000 LORs submitted on behalf of over 10,000 graduate school applicants. Results showed that LOR content does differ across applicants. Furthermore, we see some systematic gender, race, and gender-race intersection differences in LOR content. Content of LORs also systematically differed between degree programs (S.T.E.M. vs. non-S.T.E.M.) and degree sought (doctoral vs. masters). Finally, LOR content alone did not predict an appreciable amount of variance in offers of admission (the first barrier to increasing diversity and inclusion in graduate programs). Our results, combined with past research on LOR content bias, highlight concerns that LORs can be biased against marginalized groups. We conclude with suggestions for reducing potential bias in LOR and for increasing diversity in graduate programs.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42116647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to International Journal of Testing special issue on equity and fairness in testing and assessment in school admissions","authors":"S. E. Woo, B. Wille, S. Sireci","doi":"10.1080/15305058.2021.2019753","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019753","url":null,"abstract":"Across the globe, educational tests are used for admissions decisions to competitive colleges, universities, high schools, and other programs. The high stakes associated with these tests have important consequences for students, and performance on them can determine whether students reach their academic and career aspirations. For this reason, their use is both widespread and contentious. Recently, the debates over the use of standardized tests in college and graduate admissions has increased, due in large part to concerns about score disparities resulting in disparate admissions outcomes. The International Journal of Testing has published many examples of criticisms and research with respect to admissions testing around the world, including in Chile (Ramirez et al., 2020), Israel (Rapp & Allalouf, 2003), Saudi Arabia (Tsaousis et al., 2018), Sweden (Wiberg & von Davier, 2017), and the United States (e.g., TalentoMiller, 2008). In the U.S. and Chile, public outcry against disparate outcomes for certain groups of students have marshaled in changes in admissions testing programs and the policies associated with them (Koljatic et al., 2021). In the U.S., several colleges and universities have suspended the SAT and ACT requirements for their applicants, which generated a number of heated discussions both within and outside academia. The use of the Graduate Record Examinations (GREs) in graduate admissions is also being hotly debated for similar reasons, and a number of graduate programs in the U.S. have opted to remove the GRE requirement from https://doi.org/10.1080/15305058.2021.2019753","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48665709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Klieger, Jennifer L. Bochenek, Chelsea Ezzo, Steven Holtzman, Frederick Cline, Margarita Olivera-Aguilar
{"title":"Using third-party evaluations to assess socioemotional skills in graduate and professional school admissions","authors":"David Klieger, Jennifer L. Bochenek, Chelsea Ezzo, Steven Holtzman, Frederick Cline, Margarita Olivera-Aguilar","doi":"10.1080/15305058.2021.2019748","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019748","url":null,"abstract":"Abstract Consideration of socioemotional skills in admissions potentially can increase representation of racial and ethnic minorities and women in graduate and professional education as well as identify candidates more likely to succeed in graduate and professional school. Research on one such assessment, the ETS Personal Potential Index (PPI), showed that the PPI produced much smaller racial/ethnic-gender group mean score differences than undergraduate grade point average (UGPA) and the Graduate Record Examinations (GRE) did. Across levels of institutional selectivity, the PPI can promote racial/ethnic and gender diversity in graduate and professional school in ways that UGPA and GRE scores do not. Predictive validity analyses showed that for doctoral STEM programs the PPI dimensions of (1) Planning and Organization and (2) Communication Skills positively predict school grade point average as well as a lower risk of academic probation, a determinant of degree progress, both alone and incrementally over UGPA and GRE scores. Supplemental data for this article is available online at https://doi.org/10.1080/15305058.2021.2019748 .","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43931141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Test efficacy: Refocusing validation from college exams to candidates","authors":"Alvaro J. Arce, M. J. Young","doi":"10.1080/15305058.2021.2019752","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019752","url":null,"abstract":"Abstract The paper argues that contemporary test validity theory places the consequences of testing on the lives of all college applicants at the back of the test validation argument. It introduces the notion of test efficacy as a process to gather evidence on claims on consequences of testing on all college applicants that can be traced back to validity. The paper proposes a test efficacy framework to evaluate test efficacy claims on the impact of admission examinations on all college applicants (not just those attaining the admission standard).","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43977564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using personal statements in college admissions: An investigation of gender bias and the effects of increased structure","authors":"Susan Niessen, Marvin Neumann","doi":"10.1080/15305058.2021.2019749","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019749","url":null,"abstract":"Abstract Personal statements are among the most commonly used instruments in college admissions procedures. Yet, little research on their reliability, validity, and fairness exists. The first aim of this paper was to investigate hypotheses about adverse impact and underprediction for female applicants, which could result from lower tendencies to use agentic language compared to male applicants. Second, we examined if rating personal statements in a more structured manner would increase reliability and validity. Using personal statements (250 words) from a large cohort of applicants to an undergraduate psychology program at a Dutch University, we found no evidence for adverse impact for female applicants or more agentic language use by male applicants, and no relationship between agentic language use and personal statement ratings. In contrast, we found that personal statements of female applicants were rated slightly more positively than those of males. Exploratory analyses suggest that female applicants’ better writing skills might explain this difference. A more structured approach to rating personal statements yielded higher, but still only ‘moderate’ inter-rater reliability, and virtually identical, negligible predictive validity for first year GPA and dropout.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44509062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metacognitive skills inventory (MSI): development and validation","authors":"Haja Hameed, Reena Cheruvalath","doi":"10.1080/15305058.2021.1986051","DOIUrl":"https://doi.org/10.1080/15305058.2021.1986051","url":null,"abstract":"Abstract Metacognitive skills help to control and regulate negative thoughts, emotions, beliefs and sad memories. The objective of the study was to develop and validate an inventory-Metacognitive Skills Inventory (MSI) to assess the variance in adopting metacognitive strategies between those who have depressive symptoms and those who have not. Two studies were carried out among Indian youth (study 1—N = 269, MeanAge= 21.1 and study 2—N = 745, MeanAge= 20.9). They completed the MSI as well as measures of depression and negative emotions. Item response theory (IRT) analysis, and exploratory (EFA) and confirmatory factor analysis (CFA) were carried out for the scale development. The analyses derived a meaningful four-factor structure [(i) Navigation of negative thoughts by adopting metacognitive strategies, (ii) Channelizing negative emotions constructively, (iii) Recognizing ruminative tendencies, (iv) Knowledge of strengths and weaknesses in regulating emotions] of a 12-item MSI. An MSI could be used to identify patient-specific metacognitive skills in people with depressive symptoms, which need to be improved while doing Metacognitive Therapy (MCT) after validating clinical samples.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44737070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nina Menezes Cunha, Andres Martinez, P. Kyllonen, Sarah Gates
{"title":"Cross-country comparability of a social-emotional skills assessment designed for youth in low-resource environments","authors":"Nina Menezes Cunha, Andres Martinez, P. Kyllonen, Sarah Gates","doi":"10.1080/15305058.2021.1995867","DOIUrl":"https://doi.org/10.1080/15305058.2021.1995867","url":null,"abstract":"Abstract We evaluate the measurement invariance of a 48-item instrument designed to measure general social and emotional skills of youth in low resource environments. We refer to the skills measured as positive self-concept, negative self-concept, higher order thinking skills, and social and communication skills. These skills are often associated with economic development and can be used to evaluate programs designed to enhance economic development. Our evaluation is based on a sample of 1,794 in and out-of-school youth from Uganda and Guatemala’s Western Highlands. We conduct the analyses using a multiple group confirmatory factor analysis approach, breaking the sample by country, gender, and socio-economic status (high vs. low). Overall, our analysis points to strong invariance for all four measures across the different groups being compared. These findings contribute to the validity of the instrument as a tool for better understanding youth in diverse, developing economies.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43334250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Examining severity and centrality effects in TestDaF writing and speaking assessments: An extended Bayesian many-facet Rasch analysis","authors":"T. Eckes, K. Jin","doi":"10.1080/15305058.2021.1963260","DOIUrl":"https://doi.org/10.1080/15305058.2021.1963260","url":null,"abstract":"Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49282149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring task features that predict psychometric quality of test items: the case for the Dutch driving theory exam","authors":"E. Roelofs, Wilco H M Emons, Angela J. Verschoor","doi":"10.1080/15305058.2021.1916506","DOIUrl":"https://doi.org/10.1080/15305058.2021.1916506","url":null,"abstract":"Abstract This study reports on an Evidence Centered Design (ECD) project in the Netherlands, involving the theory exam for prospective car drivers. In particular, we illustrate how cognitive load theory, task-analysis, response process models, and explanatory item-response theory can be used to systematically develop and refine task models. Based on a cognitive model for driving, 353 existing items involving rules of priority at intersections, were coded on intrinsic task features and task presentation features. Hierarchical regression analyses were carried out to determine the contribution of task features to item difficulty and item discrimination. A substantial proportion of variance in both item difficulty and item discrimination parameters could be explained by intrinsic task-features, including rules and signs (25%, 18.6%), task-intersection features (13.4%, 14.1%), and a smaller small proportion to item presentation features (3.5%, 7.1%) of the total variance. It is concluded that the systematic approach of discerning task features and determining the impact on item parameters has added value as an ECD-tool for evaluating existing assessments that are planned to be innovated. The paper concludes with a discussion of practical implications.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1916506","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47064095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Validating theoretical assumptions about reading with cognitive diagnosis models","authors":"A. George, A. Robitzsch","doi":"10.1080/15305058.2021.1931238","DOIUrl":"https://doi.org/10.1080/15305058.2021.1931238","url":null,"abstract":"Abstract Modern large-scale studies such as the Progress in International Reading Literacy Study (PIRLS) do not only report reading competence of students on a global reading scale but also report reading on the level of reading subskills. However, the number of and the dependencies between the subskills are frequently discussed. In this study, different theoretical assumptions regarding the subskills describing the reading competence “acquiring and using information” in PIRLS are deduced from accompanying official materials. The different assumptions are then translated into empirical cognitive diagnosis models (CDMs). By evaluating and comparing the CDMs in terms of empirical fit criteria in each country participating in PIRLS 2016, the underlying theoretical assumptions are validated. Results show that in all but one country, a model proposing four reading subskills with no order between the subskills shows the best fit. This selected model could be simplified in order to facilitate practical derivations as, for example, the evaluation of skill classes and the analysis of learning paths.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1931238","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46664133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}