Applied Measurement in Education最新文献

筛选
英文 中文
Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring 评分者行为的预测模型:对论文评分质量保证的启示
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750406
I. Bejar, Chen Li, D. McCaffrey
{"title":"Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring","authors":"I. Bejar, Chen Li, D. McCaffrey","doi":"10.1080/08957347.2020.1750406","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750406","url":null,"abstract":"ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"234 - 247"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750406","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45042742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Applying Cognitive Theory to the Human Essay Rating Process 认知理论在人类论文评分过程中的应用
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750405
B. Finn, Burcu Arslan, M. Walsh
{"title":"Applying Cognitive Theory to the Human Essay Rating Process","authors":"B. Finn, Burcu Arslan, M. Walsh","doi":"10.1080/08957347.2020.1750405","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750405","url":null,"abstract":"ABSTRACT To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters’ scoring accuracy on TOEFL and GRE essays. We used binomial linear mixed effect models to evaluate how the effect of various predictors such as time spent scoring each response and days between scoring sessions relate to scoring accuracy. Results suggest that for both nonoperational (i.e., calibration samples completed prior to a scoring session) and operational scoring (i.e., validity samples interspersed among actual student responses), the number of days in a scoring gap negatively affects performance. The findings, as well as other results from the models are discussed in the context of cognitive influences on knowledge and skill retention.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"223 - 233"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750405","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46828655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gauging Uncertainty in Test-to-Curriculum Alignment Indices 衡量测试中的不确定性与课程一致性指标
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732387
A. Traynor, Tingxuan Li, Shuqi Zhou
{"title":"Gauging Uncertainty in Test-to-Curriculum Alignment Indices","authors":"A. Traynor, Tingxuan Li, Shuqi Zhou","doi":"10.1080/08957347.2020.1732387","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732387","url":null,"abstract":"ABSTRACT During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test’s items and performance objective statements. The individual experts’ ratings may then be used to compute summary indices to quantify the match between a given test and its target item domain. The magnitude of alignment index variability across experts within a panel, and randomly-sampled panels, is largely unknown, however. Using rater-by-item data from alignment reviews of 14 US states’ achievement tests, we examine observed distributions and estimate standard errors for three alignment indices developed by Webb. Our results suggest that alignment decisions based on the recommended criterion for the balance-of-representation index may often be uncertain, and that the criterion for the depth-of-knowledge consistency index should perhaps be reconsidered. We also examine current recommendations about the number of expert panelists required to compute these alignment indices.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"141 - 158"},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49412039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Impact of Test-Taking Disengagement on Item Content Representation 测试脱离对项目内容表示的影响
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732386
S. Wise
{"title":"The Impact of Test-Taking Disengagement on Item Content Representation","authors":"S. Wise","doi":"10.1080/08957347.2020.1732386","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732386","url":null,"abstract":"ABSTRACT In achievement testing there is typically a practical requirement that the set of items administered should be representative of some target content domain. This is accomplished by establishing test blueprints specifying the content constraints to be followed when selecting the items for a test. Sometimes, however, students give disengaged responses to some of their test items, which raises the issue of the degree to which the set of engaged responses maintain the intended content representation. The current investigation reports the results of two studies focused on rapid-guessing behavior. The first study showed evidence that differential rapid guessing often resulted in test events with meaningfully distorted content representation. The second study found that the differences in test taking engagement across content categories were primarily due to differences in the reading load of items. Implications for test-score validity are discussed along with suggestions for addressing the problem.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"83 - 94"},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41488342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments 模型拟合、不变性和有效性之间的权衡:以PISA科学评估为例
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732384
Yasmine H. El Masri, D. Andrich
{"title":"The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments","authors":"Yasmine H. El Masri, D. Andrich","doi":"10.1080/08957347.2020.1732384","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732384","url":null,"abstract":"ABSTRACT In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item functioning (DIF) of items post hoc. This typically requires a choice among retaining an item as it is despite its DIF, deleting the item, or resolving (splitting) an item by creating a distinct item for each group. These options involve a trade-off between model fit and the invariance of item parameters, and each option could be valid depending on whether or not the source of DIF is relevant or irrelevant to the variable being assessed. We argue that making a choice requires a careful analysis of statistical DIF and its substantive source. We illustrate our argument by analyzing PISA 2006 science data of three countries (UK, France and Jordan) using the Rasch model, which was the model used for the analyses of all PISA 2006 data. We identify items with real DIF across countries and examine the implications for model fit, invariance, and the validity of cross-country comparisons when these items are either eliminated, resolved or retained.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"174 - 188"},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43277464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods 比较Angoff法和Hofstee法和Beuk法的两种变体的切割分数
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732385
Adam E. Wyse
{"title":"Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods","authors":"Adam E. Wyse","doi":"10.1080/08957347.2020.1732385","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732385","url":null,"abstract":"ABSTRACT This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists’ judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores. The second variation uses Angoff ratings to determine percentage correct data in combination with responses to pass rate questions. Analysis of data from 15 standard settings suggested that the Hofstee and Beuk methods yielded similar cut scores, and that cut scores were about 2% lower when using Angoff ratings. The two approaches also differed in the weight assigned to cut score judgments in the Beuk method and in the occurrence of undefined cut scores in the Hofstee method. Findings also indicated that the Hofstee and Beuk methods often produced higher cut scores and lower pass rates than the Angoff method. It is suggested that attention needs to be paid to the strategy used to estimate Hofstee and Beuk cut scores.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"159 - 173"},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732385","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49228294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Rasch Model Extensions for Enhanced Formative Assessments in MOOCs Rasch模型扩展在mooc中增强形成性评估
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732382
D. Abbakumov, P. Desmet, W. Van den Noortgate
{"title":"Rasch Model Extensions for Enhanced Formative Assessments in MOOCs","authors":"D. Abbakumov, P. Desmet, W. Van den Noortgate","doi":"10.1080/08957347.2020.1732382","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732382","url":null,"abstract":"ABSTRACT Formative assessments are an important component of massive open online courses (MOOCs), online courses with open access and unlimited student participation. Accurate conclusions on students’ proficiency via formative, however, face several challenges: (a) students are typically allowed to make several attempts; and (b) student performance might be affected by other variables, such as interest. Thus, neglecting the effects of attempts and interest in proficiency evaluation might result in biased conclusions. In this study, we try to solve this limitation and propose two extensions of the common psychometric model, the Rasch model, by including the effects of attempts and interest. We illustrate these extensions using real MOOC data and evaluate them using cross-validation. We found that (a) the effects of attempts and interest on the performance are positive on average but both vary among students; (b) a part of the variance in proficiency parameters is due to variation between students in the effect of interest; and (c) the overall accuracy of prediction of student’s item responses using the extensions is 4.3% higher than using the Rasch model.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"113 - 123"},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732382","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44842343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Subscore Equating and Profile Reporting 分值相等和概要报告
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732381
Euijin Lim, Won‐Chan Lee
{"title":"Subscore Equating and Profile Reporting","authors":"Euijin Lim, Won‐Chan Lee","doi":"10.1080/08957347.2020.1732381","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732381","url":null,"abstract":"ABSTRACT The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in difficulty, and sample size. The results indicated that reporting subscores without equating provides misleading information in terms of score profiles and that reporting subscores without a pre-specified test specification brings practical issues such as constructing alternate subtest forms with comparable difficulty, conducting equating between forms with different lengths, and deciding an appropriate score scale to be reported.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"95 - 112"},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732381","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47621779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Effectiveness and Features of Formative Assessment in US K-12 Education: A Systematic Review 形成性评估在美国K-12教育中的有效性和特点:系统综述
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-02 DOI: 10.1080/08957347.2020.1732383
Hansol Lee, Huy Q. Chung, Yu Zhang, J. Abedi, M. Warschauer
{"title":"The Effectiveness and Features of Formative Assessment in US K-12 Education: A Systematic Review","authors":"Hansol Lee, Huy Q. Chung, Yu Zhang, J. Abedi, M. Warschauer","doi":"10.1080/08957347.2020.1732383","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732383","url":null,"abstract":"ABSTRACT In the present article, we present a systematical review of previous empirical studies that conducted formative assessment interventions to improve student learning. Previous meta-analysis research on the overall effects of formative assessment on student learning has been conclusive, but little has been studied on important features of formative assessment interventions and their differential impacts on student learning in the United States’ K-12 education system. Analysis of the identified 126 effect sizes from the selected 33 studies representing 25 research projects that met the inclusion criteria (e.g., included a control condition) revealed an overall small-sized positive effect of formative assessment on student learning (d = .29) with benefits for mathematics (d = .34), literacy (d = .33), and arts (d = .29). Further investigation with meta-regression analyses indicated that supporting student-initiated self-assessment (d = .61) and providing formal formative assessment evidence (e.g., written feedback on quizzes; d = .40) via a medium-cycle length (within or between instructional units; d = .52) were found to enhance the effectiveness of formative assessments.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"124 - 140"},"PeriodicalIF":1.5,"publicationDate":"2020-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42432168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Some Methods and Evaluation for Linking and Equating with Small Samples 小样本连接与等价的几种方法及评价
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-01-02 DOI: 10.1080/08957347.2019.1674304
Michael R. Peabody
{"title":"Some Methods and Evaluation for Linking and Equating with Small Samples","authors":"Michael R. Peabody","doi":"10.1080/08957347.2019.1674304","DOIUrl":"https://doi.org/10.1080/08957347.2019.1674304","url":null,"abstract":"ABSTRACT The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required for equating precedes the discussion of the equating methods themselves. The procedures in this review include the Tucker method, mean equating, nominal weights mean, simplified circle arc, identity equating, and IRT/Rasch model equating. Models shown that help to evaluate the success of the equating process are the standard error of equating, bias, and root-mean-square error. This should provide readers with a basic framework and enough background information to follow the studies found in this issue.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"3 - 9"},"PeriodicalIF":1.5,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2019.1674304","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48203381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信