{"title":"随着时间的推移保持分数尺度:五种评分方法的比较","authors":"S. Y. Kim, Won‐Chan Lee","doi":"10.1080/08957347.2023.2172015","DOIUrl":null,"url":null,"abstract":"ABSTRACT This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of linking with multiple test forms. Simulation factors included 1) the number of forms linked back to the initial form, 2) the pattern in mean shift, and 3) the proportion of common items. Results showed that scoring methods that operate with number-correct scores generally outperform those that are based on IRT proficiency estimators ( ) in terms of reproducing the mean and standard deviation of scale scores. Scoring methods performed differently as a function of patterns in a group proficiency change.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Maintaining Score Scales Over Time: A Comparison of Five Scoring Methods\",\"authors\":\"S. Y. Kim, Won‐Chan Lee\",\"doi\":\"10.1080/08957347.2023.2172015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of linking with multiple test forms. Simulation factors included 1) the number of forms linked back to the initial form, 2) the pattern in mean shift, and 3) the proportion of common items. Results showed that scoring methods that operate with number-correct scores generally outperform those that are based on IRT proficiency estimators ( ) in terms of reproducing the mean and standard deviation of scale scores. Scoring methods performed differently as a function of patterns in a group proficiency change.\",\"PeriodicalId\":51609,\"journal\":{\"name\":\"Applied Measurement in Education\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Measurement in Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/08957347.2023.2172015\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Measurement in Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/08957347.2023.2172015","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
Maintaining Score Scales Over Time: A Comparison of Five Scoring Methods
ABSTRACT This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of linking with multiple test forms. Simulation factors included 1) the number of forms linked back to the initial form, 2) the pattern in mean shift, and 3) the proportion of common items. Results showed that scoring methods that operate with number-correct scores generally outperform those that are based on IRT proficiency estimators ( ) in terms of reproducing the mean and standard deviation of scale scores. Scoring methods performed differently as a function of patterns in a group proficiency change.
期刊介绍:
Because interaction between the domains of research and application is critical to the evaluation and improvement of new educational measurement practices, Applied Measurement in Education" prime objective is to improve communication between academicians and practitioners. To help bridge the gap between theory and practice, articles in this journal describe original research studies, innovative strategies for solving educational measurement problems, and integrative reviews of current approaches to contemporary measurement issues. Peer Review Policy: All review papers in this journal have undergone editorial screening and peer review.