Açık Uçlu Maddelerin Puanlanmasında ChatGPT ve Gerçek Puanlayıcıların Puanlayıcılar Arası Güvenirlik Bakımından İncelenmesi

Seda DEMİR
{"title":"Açık Uçlu Maddelerin Puanlanmasında ChatGPT ve Gerçek Puanlayıcıların Puanlayıcılar Arası Güvenirlik Bakımından İncelenmesi","authors":"Seda DEMİR","doi":"10.46778/goputeb.1345752","DOIUrl":null,"url":null,"abstract":"The aim of this study is to examine the inter-rater reliability of the responses to open-ended items scored by ChatGPT, an artificial intelligence-based tool, and two real raters according to the scoring keys. The study group consists of 30 students, aged between 13 and 15, studying in Eskişehir province in the 2022-2023 academic year. The data of the study were collected face-to-face with the help of 16 open-ended items selected from the sample questions published in the International Student Assessment Program-PISA Reading Skills. Correlation, percentage of agreement and the Generalizability theory were used to determine inter-rater reliability. SPSS 25 was used for correlation analysis, Excel for percentage of agreement analysis, and EduG 6.1 for the Generalizability theory analysis. The results of the study showed that there was a positive and high level of correlation between the raters, the raters showed a high level of agreement, and the reliability (G) coefficients calculated using the Generalizability theory were lower than the correlation values and percentage of agreement. In addition, it was determined that all raters showed excellent positive correlation and full agreement with each other in the scoring of the answers given to the short-answer items whose answers were directly in the text. In addition, according to the results of the Generalizability theory, it was found out that the items (i) explained the total variance the most among the main effects and the student-item interaction (sxi) explained the most among the interaction effects. As a result, it can be suggested to educators to get support from artificial intelligence-based tools such as ChatGPT when scoring open-ended items that take a long time to score, especially in crowded classes or when time is limited.","PeriodicalId":312663,"journal":{"name":"Uluslararası Türk Eğitim Bilimleri Dergisi","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Uluslararası Türk Eğitim Bilimleri Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46778/goputeb.1345752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The aim of this study is to examine the inter-rater reliability of the responses to open-ended items scored by ChatGPT, an artificial intelligence-based tool, and two real raters according to the scoring keys. The study group consists of 30 students, aged between 13 and 15, studying in Eskişehir province in the 2022-2023 academic year. The data of the study were collected face-to-face with the help of 16 open-ended items selected from the sample questions published in the International Student Assessment Program-PISA Reading Skills. Correlation, percentage of agreement and the Generalizability theory were used to determine inter-rater reliability. SPSS 25 was used for correlation analysis, Excel for percentage of agreement analysis, and EduG 6.1 for the Generalizability theory analysis. The results of the study showed that there was a positive and high level of correlation between the raters, the raters showed a high level of agreement, and the reliability (G) coefficients calculated using the Generalizability theory were lower than the correlation values and percentage of agreement. In addition, it was determined that all raters showed excellent positive correlation and full agreement with each other in the scoring of the answers given to the short-answer items whose answers were directly in the text. In addition, according to the results of the Generalizability theory, it was found out that the items (i) explained the total variance the most among the main effects and the student-item interaction (sxi) explained the most among the interaction effects. As a result, it can be suggested to educators to get support from artificial intelligence-based tools such as ChatGPT when scoring open-ended items that take a long time to score, especially in crowded classes or when time is limited.
ChatGPT 和真实评分员在开放式项目评分中的评分员间信度调查
本研究的目的是检验由ChatGPT(一种基于人工智能的工具)和两个真实评分者根据评分键对开放式项目评分的反应的评分者之间的可靠性。该研究小组由30名年龄在13岁至15岁之间的学生组成,他们将于2022-2023学年在爱斯基基省学习。该研究的数据是面对面收集的,并从国际学生评估项目- pisa阅读技能发布的样本问题中选择了16个开放式项目。使用相关性、一致性百分比和概率性理论来确定评分者之间的信度。相关性分析采用SPSS 25,一致性百分比分析采用Excel,通用性理论分析采用EduG 6.1。研究结果表明,评分者之间存在高度正相关关系,评分者表现出高度的一致性,采用概化理论计算的信度(G)系数低于相关值和一致性百分比。此外,我们确定所有评分者在对答案直接在文本中的短答案项目的答案打分时表现出极好的正相关和完全一致。此外,根据概化理论的结果,发现项目(i)在主要效应中解释总方差最大,学生-项目交互作用(sxi)在交互作用中解释方差最大。因此,可以建议教育工作者在评分需要很长时间的开放式项目时,特别是在拥挤的班级或时间有限的情况下,获得ChatGPT等基于人工智能的工具的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信