利用与内容无关的特征预测简短回复评分:分层建模方法

Aubrey Condor
{"title":"利用与内容无关的特征预测简短回复评分:分层建模方法","authors":"Aubrey Condor","doi":"arxiv-2405.08574","DOIUrl":null,"url":null,"abstract":"We explore whether the human ratings of open ended responses can be explained\nwith non-content related features, and if such effects vary across different\nmathematics-related items. When scoring is rigorously defined and rooted in a\nmeasurement framework, educators intend that the features of a response which\nare indicative of the respondent's level of ability are contributing to scores.\nHowever, we find that features such as response length, a grammar score of the\nresponse, and a metric relating to key phrase frequency are significant\npredictors for response ratings. Although our findings are not causally\nconclusive, they may propel us to be more critical of he way in which we assess\nopen ended responses, especially in high stakes scenarios. Educators take great\ncare to provide unbiased, consistent ratings, but it may be that extraneous\nfeatures unrelated to those which were intended to be rated are being\nevaluated.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting Short Response Ratings with Non-Content Related Features: A Hierarchical Modeling Approach\",\"authors\":\"Aubrey Condor\",\"doi\":\"arxiv-2405.08574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore whether the human ratings of open ended responses can be explained\\nwith non-content related features, and if such effects vary across different\\nmathematics-related items. When scoring is rigorously defined and rooted in a\\nmeasurement framework, educators intend that the features of a response which\\nare indicative of the respondent's level of ability are contributing to scores.\\nHowever, we find that features such as response length, a grammar score of the\\nresponse, and a metric relating to key phrase frequency are significant\\npredictors for response ratings. Although our findings are not causally\\nconclusive, they may propel us to be more critical of he way in which we assess\\nopen ended responses, especially in high stakes scenarios. Educators take great\\ncare to provide unbiased, consistent ratings, but it may be that extraneous\\nfeatures unrelated to those which were intended to be rated are being\\nevaluated.\",\"PeriodicalId\":501323,\"journal\":{\"name\":\"arXiv - STAT - Other Statistics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Other Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.08574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Other Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.08574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们探讨了人类对开放式作答的评分是否可以用与内容无关的特征来解释,以及这种影响在不同的数学相关项目中是否会有所不同。当评分被严格定义并植根于一个测量框架时,教育者希望能反映答题者能力水平的答题特征能对评分做出贡献。然而,我们发现,答题长度、答题语法得分以及与关键短语频率相关的指标等特征是答题评分的重要预测因素。尽管我们的研究结果并不具有因果关系,但它们可能会促使我们对评估开放式回答的方式更加挑剔,尤其是在高风险的情况下。教育工作者会非常谨慎地提供公正、一致的评分,但也有可能是那些与评分目的无关的无关特征被评估了。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting Short Response Ratings with Non-Content Related Features: A Hierarchical Modeling Approach
We explore whether the human ratings of open ended responses can be explained with non-content related features, and if such effects vary across different mathematics-related items. When scoring is rigorously defined and rooted in a measurement framework, educators intend that the features of a response which are indicative of the respondent's level of ability are contributing to scores. However, we find that features such as response length, a grammar score of the response, and a metric relating to key phrase frequency are significant predictors for response ratings. Although our findings are not causally conclusive, they may propel us to be more critical of he way in which we assess open ended responses, especially in high stakes scenarios. Educators take great care to provide unbiased, consistent ratings, but it may be that extraneous features unrelated to those which were intended to be rated are being evaluated.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信