人类和人工智能(AI)在牙科修复论文写作和评级方面的比较。

IF 1.9 4区 教育学 Q3 DENTISTRY, ORAL SURGERY & MEDICINE
Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil
{"title":"人类和人工智能(AI)在牙科修复论文写作和评级方面的比较。","authors":"Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil","doi":"10.1111/eje.70051","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.</p><p><strong>Methods: </strong>Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).</p><p><strong>Results: </strong>Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.</p><p><strong>Conclusions: </strong>Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.</p>","PeriodicalId":50488,"journal":{"name":"European Journal of Dental Education","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Human and Artificial Intelligence (AI) in Writing and Rating Restorative Dentistry Essays.\",\"authors\":\"Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil\",\"doi\":\"10.1111/eje.70051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aim: </strong>This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.</p><p><strong>Methods: </strong>Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).</p><p><strong>Results: </strong>Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.</p><p><strong>Conclusions: </strong>Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.</p>\",\"PeriodicalId\":50488,\"journal\":{\"name\":\"European Journal of Dental Education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Dental Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1111/eje.70051\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Dental Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1111/eje.70051","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究比较了人工智能和学生撰写的论文的写作质量,评估了教师区分人工智能和学生撰写的论文的可靠性,并使用标准化标准评估了人类和人工智能评分者之间的评分准确性。方法:选取临床前牙外科美容课程的4个课题。每个主题以四篇论文的形式呈现,由两名学生和两种人工智能工具(ChatGPT4和Gemini)撰写(N = 48)。然后,这16篇论文由三名盲法专家和两名人工智能工具评分者(ChatGPT4和Gemini)使用修改后的通用科学写作标准进行评估。夏皮罗-威尔克W检验评估数据正态性。Kruskal-Wallis、Dunn’s Pairwise、Wilcoxon Signed-rank和Friedman检验分析了写作表现和评分者间信度,显著性水平为(α = 0.05)。结果:人工智能和学生撰写的论文在评估科学内容(Z = 9.28, p = 0.005)和解释科学内容(Z = 6.74, p = 0.021)方面存在显著差异。ChatGPT4撰写的论文在科学内容上与双子座和学生撰写的论文有显著差异,在解释上与学生撰写的论文有进一步的差异(p = 0.011)。教职员工正确识别了75%的论文作者。在使用ChatGPT4或Gemini的评分者之间没有发现显著差异,而在人类评分者之间的总体评分有轻微显著差异,尽管在具体参数上没有差异。结论:Gemini的分数与人类撰写的论文非常接近,比ChatGPT-4更接近人类评分。人工智能的能力模仿了人类的写作,尽管对于训练有素的教师来说,差异是显而易见的。人工智能和人类创作的作品在内容质量和组织方面存在明显差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of Human and Artificial Intelligence (AI) in Writing and Rating Restorative Dentistry Essays.

Aim: This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.

Methods: Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).

Results: Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.

Conclusions: Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.10
自引率
16.70%
发文量
127
审稿时长
6-12 weeks
期刊介绍: The aim of the European Journal of Dental Education is to publish original topical and review articles of the highest quality in the field of Dental Education. The Journal seeks to disseminate widely the latest information on curriculum development teaching methodologies assessment techniques and quality assurance in the fields of dental undergraduate and postgraduate education and dental auxiliary personnel training. The scope includes the dental educational aspects of the basic medical sciences the behavioural sciences the interface with medical education information technology and distance learning and educational audit. Papers embodying the results of high-quality educational research of relevance to dentistry are particularly encouraged as are evidence-based reports of novel and established educational programmes and their outcomes.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信