Applied Measurement in Education最新文献

筛选
英文 中文
Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items 局部依赖性检测:一种阈值-自回归项目反应理论(TAR-IRT)方法
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-20 DOI: 10.1080/08957347.2020.1789136
Xiaodan Tang, G. Karabatsos, Haiqin Chen
{"title":"Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items","authors":"Xiaodan Tang, G. Karabatsos, Haiqin Chen","doi":"10.1080/08957347.2020.1789136","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789136","url":null,"abstract":"ABSTRACT In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses of each examinee. The TAR-IRT approach also defines a new family of IRT models for polytomous item responses under both unidimensional and multidimensional frameworks, with order-dependent effects between item responses and relevant dimensions. The feasibility of the proposed model was demonstrated by an empirical study using a polytomous response data. A simulation study for polytomous item responses with order effects of different magnitude in an education context shows that the TAR modeling framework could provide more accurate ability estimation than the partial credit model when order effect exists.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"280 - 292"},"PeriodicalIF":1.5,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789136","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42274266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Validating Rubric Scoring Processes: An Application of an Item Response Tree Model 验证标题评分过程:项目反应树模型的应用
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-20 DOI: 10.1080/08957347.2020.1789143
Aaron J. Myers, Allison J. Ames, B. Leventhal, Madison A. Holzman
{"title":"Validating Rubric Scoring Processes: An Application of an Item Response Tree Model","authors":"Aaron J. Myers, Allison J. Ames, B. Leventhal, Madison A. Holzman","doi":"10.1080/08957347.2020.1789143","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789143","url":null,"abstract":"ABSTRACT When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters’ scoring processes and the intended scoring processes may lead to invalid inferences from these scores. In an effort to standardize raters’ scoring processes, an alternative scoring method was used. With this method, rubric developers’ intended scoring processes are made explicit by requiring raters to respond to a series of selected-response statements resembling a decision tree. To determine if raters scored essays as intended using a traditional rubric and the alternative scoring method, an IRT model with a tree-like structure (IRTree) was specified to depict the intended scoring processes and fit to data from each scoring method. Results suggest raters using the alternative method may be better able to rate as intended and thus the alternative method may be a viable alternative to traditional rubric scoring. Implications of the IRTree model are discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"293 - 308"},"PeriodicalIF":1.5,"publicationDate":"2020-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789143","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42133773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An IRT Mixture Model for Rating Scale Confusion Associated with Negatively Worded Items in Measures of Social-Emotional Learning 社会情绪学习量表中负面词汇混淆的IRT混合模型
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-16 DOI: 10.1080/08957347.2020.1789140
D. Bolt, Y. Wang, R. Meyer, L. Pier
{"title":"An IRT Mixture Model for Rating Scale Confusion Associated with Negatively Worded Items in Measures of Social-Emotional Learning","authors":"D. Bolt, Y. Wang, R. Meyer, L. Pier","doi":"10.1080/08957347.2020.1789140","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789140","url":null,"abstract":"ABSTRACT We illustrate the application of mixture IRT models to evaluate respondent confusion due to the negative wording of certain items on a social-emotional learning (SEL) assessment. Using actual student self-report ratings on four social-emotional learning scales collected from students in grades 3–12 from CORE Districts in the state of California, we also evaluate the consequences of the potential confusion in biasing student- and school-level scores as well as the estimated correlational relationships between SEL constructs and student-level variables. Models of both full and partial confusion are examined. Our results suggest that (1) rating scale confusion due to negatively worded items does appear to be present; (2) the confusion is most prevalent at lower grade levels (third–fifth); and (3) the occurrence of confusion is positively related to both reading proficiency and ELL status, as anticipated, and consequently biases estimates of SEL correlations with these student-level variables. For these reasons, we suggest future iterations of the SEL measures use only positively oriented items.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"331 - 348"},"PeriodicalIF":1.5,"publicationDate":"2020-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43253014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Evaluating Random and Systematic Error in Student Growth Percentiles 评价学生成长百分位数的随机和系统误差
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-15 DOI: 10.1080/08957347.2020.1789139
C. Wells, S. Sireci
{"title":"Evaluating Random and Systematic Error in Student Growth Percentiles","authors":"C. Wells, S. Sireci","doi":"10.1080/08957347.2020.1789139","DOIUrl":"https://doi.org/10.1080/08957347.2020.1789139","url":null,"abstract":"ABSTRACT Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in SGPs by simulating test scores for four grades and estimating SGPs using one, two, or three conditioning years. The results indicated that, although the amount of systematic error was small to moderate, the amount of random error was substantial, regardless of the number of conditioning years. For example, the standard error of the SGP estimates associated with an SGP value of 56 was 22.2 resulting in a 68% confidence interval that would range from 33.8 to 78.2 when using three conditioning years. The results are consistent with previous research and suggest SGP estimates are too imprecise to be reported for the purpose of understanding students’ progress over time.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"349 - 361"},"PeriodicalIF":1.5,"publicationDate":"2020-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1789139","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43006041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy 设定评分期望对评分者评分率和准确性的影响
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750401
Cathy L. W. Wendler, Nancy Glazer, B. Bridgeman
{"title":"The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy","authors":"Cathy L. W. Wendler, Nancy Glazer, B. Bridgeman","doi":"10.1080/08957347.2020.1750401","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750401","url":null,"abstract":"ABSTRACT Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups (slow, medium, and fast) and two conditions (informed and uninformed) were used. In both conditions, raters were given identical scoring directions, but only the informed groups were given an expected scoring rate. Results indicated no significant differences across the two conditions. However, there were significant increases in scoring rates for medium and slow raters compared to their previous operational rates, regardless of whether they were in the informed or uninformed condition. Results also showed there were no significant effects on rater accuracy for either of the two conditions or for any of the rater groups.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"248 - 254"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750401","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42360842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding and Interpreting Human Scoring 理解和解释人类评分
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750402
Nancy Glazer, E. Wolfe
{"title":"Understanding and Interpreting Human Scoring","authors":"Nancy Glazer, E. Wolfe","doi":"10.1080/08957347.2020.1750402","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750402","url":null,"abstract":"ABSTRACT This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed response scoring process. That framework identifies three classifications of inputs (rater characteristics, response content, and rating context) which typically serve as independent variables in constructed response scoring research as well as three primary outcomes (rating quality, rating speed, and rater attitude) which serve as the dependent variables in those studies. Finally, we explain how each of the articles in this issue can be classified according to that framework.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"191 - 197"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42557747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Why Should We Care about Human Raters? 为什么我们应该关心人类评级机构?
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750407
E. Wolfe, Cathy L. W. Wendler
{"title":"Why Should We Care about Human Raters?","authors":"E. Wolfe, Cathy L. W. Wendler","doi":"10.1080/08957347.2020.1750407","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750407","url":null,"abstract":"For more than a decade, measurement practitioners and researchers have emphasized evaluating, improving, and implementing automated scoring of constructed response (CR) items and tasks. There is go...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"189 - 190"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46471978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Commentary on “Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance” 对“在构建反应评分中使用人类评分员:理解、预测和修改表现”的评论
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750408
Walter D. Way
{"title":"Commentary on “Using Human Raters in Constructed Response Scoring: Understanding, Predicting, and Modifying Performance”","authors":"Walter D. Way","doi":"10.1080/08957347.2020.1750408","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750408","url":null,"abstract":"This special issue of AME provides a rich set of articles related to monitoring human scoring of constructed response items. As a starting point for this commentary, is it worth mentioning that the...","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"255 - 261"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750408","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41452462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Human Scoring Using Generalizability Theory 用概化理论评价人的得分
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750403
Y. Bimpeh, W. Pointer, Ben A. Smith, Liz Harrison
{"title":"Evaluating Human Scoring Using Generalizability Theory","authors":"Y. Bimpeh, W. Pointer, Ben A. Smith, Liz Harrison","doi":"10.1080/08957347.2020.1750403","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750403","url":null,"abstract":"ABSTRACT Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we apply generalizability theory (G theory) to data from routine monitoring of ratings to derive an estimate for inter-rater reliability. UK examinations use a combination of double or multiple rating for routine monitoring, creating a more complex design that consists of cross-pairing of raters and overlapping of raters for different groups of candidates or items. This sampling design is neither fully crossed nor is it nested. Each double- or multiple-scored item takes a different set of candidates, and the number of sampled candidates per item varies. Therefore, the standard G theory method, and its various forms for estimating inter-rater reliability, cannot be directly applied to the operational data. We propose a method that takes double or multiple rating data as given and analyzes the datasets at the item level in order to obtain more accurate and stable variance component estimates. We adapt the variance component in observed scores for an unbalanced one-facet crossed design with some missing observations. These estimates can be used to make inferences about the reliability of the entire scoring process. We illustrate the proposed method by applying it to real scoring data.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"198 - 209"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750403","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43345855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Impact of Operational Scoring Experience and Additional Mentored Training on Raters’ Essay Scoring Accuracy 操作评分经验和额外的指导培训对评分员论文评分准确性的影响
IF 1.5 4区 教育学
Applied Measurement in Education Pub Date : 2020-07-02 DOI: 10.1080/08957347.2020.1750404
Ikkyu Choi, E. Wolfe
{"title":"The Impact of Operational Scoring Experience and Additional Mentored Training on Raters’ Essay Scoring Accuracy","authors":"Ikkyu Choi, E. Wolfe","doi":"10.1080/08957347.2020.1750404","DOIUrl":"https://doi.org/10.1080/08957347.2020.1750404","url":null,"abstract":"ABSTRACT Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what long-term effects training can have. In this study, we addressed this research gap by tracking how the accuracies of new raters change through experience and by examining the impact of an additional training session on their accuracies in scoring calibration and monitoring essays. We found that, on average, raters’ accuracy improved with scoring experience and that individual raters differed in their accuracy trajectories. The estimated average effect of the training was an approximately six percent increase in the calibration essay accuracy. On the other hand, we observed a smaller impact on the monitoring essay accuracy. Our follow-up analysis showed that this differential impact of the additional training on the calibration and monitoring essay accuracy could be accounted for by successful gatekeeping through calibration.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"210 - 222"},"PeriodicalIF":1.5,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750404","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45226677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信