Rudolf Debelak, S. Appelbaum, Dries Debeer, M. Tomasik
{"title":"二级物流多阶段评估中差异项目功能的检测","authors":"Rudolf Debelak, S. Appelbaum, Dries Debeer, M. Tomasik","doi":"10.3390/psych5020031","DOIUrl":null,"url":null,"abstract":"The detection of differential item functioning is crucial for the psychometric evaluation of multistage tests. This paper discusses five approaches presented in the literature: logistic regression, SIBTEST, analytical score-based tests, bootstrap score-based tests, and permutation score-based tests. First, using an simulation study inspired by a real-life large-scale educational assessment, we compare the five approaches with respect to their type I error rate and their statistical power. Then, we present an application to an empirical data set. We find that all approaches show type I error rates close to the nominal alpha level. Furthermore, all approaches are shown to be sensitive to uniform and non-uniform DIF effects, with the score-based tests showing the highest power.","PeriodicalId":93139,"journal":{"name":"Psych","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Detecting Differential Item Functioning in 2PL Multistage Assessments\",\"authors\":\"Rudolf Debelak, S. Appelbaum, Dries Debeer, M. Tomasik\",\"doi\":\"10.3390/psych5020031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The detection of differential item functioning is crucial for the psychometric evaluation of multistage tests. This paper discusses five approaches presented in the literature: logistic regression, SIBTEST, analytical score-based tests, bootstrap score-based tests, and permutation score-based tests. First, using an simulation study inspired by a real-life large-scale educational assessment, we compare the five approaches with respect to their type I error rate and their statistical power. Then, we present an application to an empirical data set. We find that all approaches show type I error rates close to the nominal alpha level. Furthermore, all approaches are shown to be sensitive to uniform and non-uniform DIF effects, with the score-based tests showing the highest power.\",\"PeriodicalId\":93139,\"journal\":{\"name\":\"Psych\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psych\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/psych5020031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psych","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/psych5020031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Differential Item Functioning in 2PL Multistage Assessments
The detection of differential item functioning is crucial for the psychometric evaluation of multistage tests. This paper discusses five approaches presented in the literature: logistic regression, SIBTEST, analytical score-based tests, bootstrap score-based tests, and permutation score-based tests. First, using an simulation study inspired by a real-life large-scale educational assessment, we compare the five approaches with respect to their type I error rate and their statistical power. Then, we present an application to an empirical data set. We find that all approaches show type I error rates close to the nominal alpha level. Furthermore, all approaches are shown to be sensitive to uniform and non-uniform DIF effects, with the score-based tests showing the highest power.