Açık Uçlu Maddelerin Puanlanmasında ChatGPT ve Gerçek Puanlayıcıların Puanlayıcılar Arası Güvenirlik Bakımından İncelenmesi

Uluslararası Türk Eğitim Bilimleri Dergisi Pub Date : 2023-10-31 DOI:10.46778/goputeb.1345752

Seda DEMİR

{"title":"Açık Uçlu Maddelerin Puanlanmasında ChatGPT ve Gerçek Puanlayıcıların Puanlayıcılar Arası Güvenirlik Bakımından İncelenmesi","authors":"Seda DEMİR","doi":"10.46778/goputeb.1345752","DOIUrl":null,"url":null,"abstract":"The aim of this study is to examine the inter-rater reliability of the responses to open-ended items scored by ChatGPT, an artificial intelligence-based tool, and two real raters according to the scoring keys. The study group consists of 30 students, aged between 13 and 15, studying in Eskişehir province in the 2022-2023 academic year. The data of the study were collected face-to-face with the help of 16 open-ended items selected from the sample questions published in the International Student Assessment Program-PISA Reading Skills. Correlation, percentage of agreement and the Generalizability theory were used to determine inter-rater reliability. SPSS 25 was used for correlation analysis, Excel for percentage of agreement analysis, and EduG 6.1 for the Generalizability theory analysis. The results of the study showed that there was a positive and high level of correlation between the raters, the raters showed a high level of agreement, and the reliability (G) coefficients calculated using the Generalizability theory were lower than the correlation values and percentage of agreement. In addition, it was determined that all raters showed excellent positive correlation and full agreement with each other in the scoring of the answers given to the short-answer items whose answers were directly in the text. In addition, according to the results of the Generalizability theory, it was found out that the items (i) explained the total variance the most among the main effects and the student-item interaction (sxi) explained the most among the interaction effects. As a result, it can be suggested to educators to get support from artificial intelligence-based tools such as ChatGPT when scoring open-ended items that take a long time to score, especially in crowded classes or when time is limited.","PeriodicalId":312663,"journal":{"name":"Uluslararası Türk Eğitim Bilimleri Dergisi","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Uluslararası Türk Eğitim Bilimleri Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46778/goputeb.1345752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of this study is to examine the inter-rater reliability of the responses to open-ended items scored by ChatGPT, an artificial intelligence-based tool, and two real raters according to the scoring keys. The study group consists of 30 students, aged between 13 and 15, studying in Eskişehir province in the 2022-2023 academic year. The data of the study were collected face-to-face with the help of 16 open-ended items selected from the sample questions published in the International Student Assessment Program-PISA Reading Skills. Correlation, percentage of agreement and the Generalizability theory were used to determine inter-rater reliability. SPSS 25 was used for correlation analysis, Excel for percentage of agreement analysis, and EduG 6.1 for the Generalizability theory analysis. The results of the study showed that there was a positive and high level of correlation between the raters, the raters showed a high level of agreement, and the reliability (G) coefficients calculated using the Generalizability theory were lower than the correlation values and percentage of agreement. In addition, it was determined that all raters showed excellent positive correlation and full agreement with each other in the scoring of the answers given to the short-answer items whose answers were directly in the text. In addition, according to the results of the Generalizability theory, it was found out that the items (i) explained the total variance the most among the main effects and the student-item interaction (sxi) explained the most among the interaction effects. As a result, it can be suggested to educators to get support from artificial intelligence-based tools such as ChatGPT when scoring open-ended items that take a long time to score, especially in crowded classes or when time is limited.

查看原文本刊更多论文

ChatGPT 和真实评分员在开放式项目评分中的评分员间信度调查

本研究的目的是检验由ChatGPT(一种基于人工智能的工具)和两个真实评分者根据评分键对开放式项目评分的反应的评分者之间的可靠性。该研究小组由30名年龄在13岁至15岁之间的学生组成，他们将于2022-2023学年在爱斯基基省学习。该研究的数据是面对面收集的，并从国际学生评估项目- pisa阅读技能发布的样本问题中选择了16个开放式项目。使用相关性、一致性百分比和概率性理论来确定评分者之间的信度。相关性分析采用SPSS 25，一致性百分比分析采用Excel，通用性理论分析采用EduG 6.1。研究结果表明，评分者之间存在高度正相关关系，评分者表现出高度的一致性，采用概化理论计算的信度(G)系数低于相关值和一致性百分比。此外，我们确定所有评分者在对答案直接在文本中的短答案项目的答案打分时表现出极好的正相关和完全一致。此外，根据概化理论的结果，发现项目(i)在主要效应中解释总方差最大，学生-项目交互作用(sxi)在交互作用中解释方差最大。因此，可以建议教育工作者在评分需要很长时间的开放式项目时，特别是在拥挤的班级或时间有限的情况下，获得ChatGPT等基于人工智能的工具的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Uluslararası Türk Eğitim Bilimleri Dergisi

自引率

0.00%

发文量