Applied Measurement in Education最新文献

筛选
英文 中文
Evaluating Random and Systematic Error in Student Growth Percentiles 评价学生成长百分位数的随机和系统误差
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-15 DOI: 10.1080/08957347.2020.1789139
C. Wells, S. Sireci
{"title":"Evaluating Random and Systematic Error in Student Growth Percentiles","authors":"C. Wells, S. Sireci","doi":"10.1080/08957347.2020.1789139","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789139","url":null,"abstract":"ABSTRACT Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in SGPs by simulating test scores for four grades and estimating SGPs using one, two, or three conditioning years. The results indicated that, although the amount of systematic error was small to moderate, the amount of random error was substantial, regardless of the number of conditioning years. For example, the standard error of the SGP estimates associated with an SGP value of 56 was 22.2 resulting in a 68% confidence interval that would range from 33.8 to 78.2 when using three conditioning years. The results are consistent with previous research and suggest SGP estimates are too imprecise to be reported for the purpose of understanding students’ progress over time.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789139","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43006041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy 设定评分期望对评分者评分率和准确性的影响
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750401
Cathy L. W. Wendler, Nancy Glazer, B. Bridgeman
{"title":"The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy","authors":"Cathy L. W. Wendler, Nancy Glazer, B. Bridgeman","doi":"10.1080/08957347.2020.1750401","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750401","url":null,"abstract":"ABSTRACT Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups (slow, medium, and fast) and two conditions (informed and uninformed) were used. In both conditions, raters were given identical scoring directions, but only the informed groups were given an expected scoring rate. Results indicated no significant differences across the two conditions. However, there were significant increases in scoring rates for medium and slow raters compared to their previous operational rates, regardless of whether they were in the informed or uninformed condition. Results also showed there were no significant effects on rater accuracy for either of the two conditions or for any of the rater groups.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750401","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42360842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding and Interpreting Human Scoring 理解和解释人类评分
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750402
Nancy Glazer, E. Wolfe
{"title":"Understanding and Interpreting Human Scoring","authors":"Nancy Glazer, E. Wolfe","doi":"10.1080/08957347.2020.1750402","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750402","url":null,"abstract":"ABSTRACT This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed response scoring process. That framework identifies three classifications of inputs (rater characteristics, response content, and rating context) which typically serve as independent variables in constructed response scoring research as well as three primary outcomes (rating quality, rating speed, and rater attitude) which serve as the dependent variables in those studies. Finally, we explain how each of the articles in this issue can be classified according to that framework.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42557747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Why Should We Care about Human Raters? 为什么我们应该关心人类评级机构?
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750407
E. Wolfe, Cathy L. W. Wendler
{"title":"Why Should We Care about Human Raters?","authors":"E. Wolfe, Cathy L. W. Wendler","doi":"10.1080/08957347.2020.1750407","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750407","url":null,"abstract":"For more than a decade, measurement practitioners and researchers have emphasized evaluating, improving, and implementing automated scoring of constructed response (CR) items and tasks. There is go...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46471978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Commentary on “Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance” 对“在构建反应评分中使用人类评分员:理解、预测和修改表现”的评论
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750408
Walter D. Way
{"title":"Commentary on “Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance”","authors":"Walter D. Way","doi":"10.1080/08957347.2020.1750408","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750408","url":null,"abstract":"This special issue of AME provides a rich set of articles related to monitoring human scoring of constructed response items. As a starting point for this commentary, is it worth mentioning that the...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750408","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41452462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Human Scoring Using Generalizability Theory 用概化理论评价人的得分
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750403
Y. Bimpeh, W. Pointer, Ben A. Smith, Liz Harrison
{"title":"Evaluating Human Scoring Using Generalizability Theory","authors":"Y. Bimpeh, W. Pointer, Ben A. Smith, Liz Harrison","doi":"10.1080/08957347.2020.1750403","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750403","url":null,"abstract":"ABSTRACT Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we apply generalizability theory (G theory) to data from routine monitoring of ratings to derive an estimate for inter-rater reliability. UK examinations use a combination of double or multiple rating for routine monitoring, creating a more complex design that consists of cross-pairing of raters and overlapping of raters for different groups of candidates or items. This sampling design is neither fully crossed nor is it nested. Each double- or multiple-scored item takes a different set of candidates, and the number of sampled candidates per item varies. Therefore, the standard G theory method, and its various forms for estimating inter-rater reliability, cannot be directly applied to the operational data. We propose a method that takes double or multiple rating data as given and analyzes the datasets at the item level in order to obtain more accurate and stable variance component estimates. We adapt the variance component in observed scores for an unbalanced one-facet crossed design with some missing observations. These estimates can be used to make inferences about the reliability of the entire scoring process. We illustrate the proposed method by applying it to real scoring data.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750403","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43345855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Impact of Operational Scoring Experience and Additional Mentored Training on Raters’ Essay Scoring Accuracy 操作评分经验和额外的指导培训对评分员论文评分准确性的影响
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750404
Ikkyu Choi, E. Wolfe
{"title":"The Impact of Operational Scoring Experience and Additional Mentored Training on Raters’ Essay Scoring Accuracy","authors":"Ikkyu Choi, E. Wolfe","doi":"10.1080/08957347.2020.1750404","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750404","url":null,"abstract":"ABSTRACT Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what long-term effects training can have. In this study, we addressed this research gap by tracking how the accuracies of new raters change through experience and by examining the impact of an additional training session on their accuracies in scoring calibration and monitoring essays. We found that, on average, raters’ accuracy improved with scoring experience and that individual raters differed in their accuracy trajectories. The estimated average effect of the training was an approximately six percent increase in the calibration essay accuracy. On the other hand, we observed a smaller impact on the monitoring essay accuracy. Our follow-up analysis showed that this differential impact of the additional training on the calibration and monitoring essay accuracy could be accounted for by successful gatekeeping through calibration.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750404","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45226677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring 评分者行为的预测模型:对论文评分质量保证的启示
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750406
I. Bejar, Chen Li, D. McCaffrey
{"title":"Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring","authors":"I. Bejar, Chen Li, D. McCaffrey","doi":"10.1080/08957347.2020.1750406","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750406","url":null,"abstract":"ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750406","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45042742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Applying Cognitive Theory to the Human Essay Rating Process 认知理论在人类论文评分过程中的应用
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750405
B. Finn, Burcu Arslan, M. Walsh
{"title":"Applying Cognitive Theory to the Human Essay Rating Process","authors":"B. Finn, Burcu Arslan, M. Walsh","doi":"10.1080/08957347.2020.1750405","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750405","url":null,"abstract":"ABSTRACT To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters’ scoring accuracy on TOEFL and GRE essays. We used binomial linear mixed effect models to evaluate how the effect of various predictors such as time spent scoring each response and days between scoring sessions relate to scoring accuracy. Results suggest that for both nonoperational (i.e., calibration samples completed prior to a scoring session) and operational scoring (i.e., validity samples interspersed among actual student responses), the number of days in a scoring gap negatively affects performance. The findings, as well as other results from the models are discussed in the context of cognitive influences on knowledge and skill retention.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750405","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46828655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Gauging Uncertainty in Test-to-Curriculum Alignment Indices 衡量测试中的不确定性与课程一致性指标
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-03-03 DOI: 10.1080/08957347.2020.1732387
A. Traynor, Tingxuan Li, Shuqi Zhou
{"title":"Gauging Uncertainty in Test-to-Curriculum Alignment Indices","authors":"A. Traynor, Tingxuan Li, Shuqi Zhou","doi":"10.1080/08957347.2020.1732387","DOIUrl":"https://doi.org/10.1080/08957347.2020.1732387","url":null,"abstract":"ABSTRACT During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test’s items and performance objective statements. The individual experts’ ratings may then be used to compute summary indices to quantify the match between a given test and its target item domain. The magnitude of alignment index variability across experts within a panel, and randomly-sampled panels, is largely unknown, however. Using rater-by-item data from alignment reviews of 14 US states’ achievement tests, we examine observed distributions and estimate standard errors for three alignment indices developed by Webb. Our results suggest that alignment decisions based on the recommended criterion for the balance-of-representation index may often be uncertain, and that the criterion for the depth-of-knowledge consistency index should perhaps be reconsidered. We also examine current recommendations about the number of expert panelists required to compute these alignment indices.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1732387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49412039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信