人类和人工智能（AI）在牙科修复论文写作和评级方面的比较。

IF 1.9 4区教育学 Q3 DENTISTRY, ORAL SURGERY & MEDICINE

European Journal of Dental Education Pub Date : 2025-09-25 DOI:10.1111/eje.70051

Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil

{"title":"人类和人工智能（AI）在牙科修复论文写作和评级方面的比较。","authors":"Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil","doi":"10.1111/eje.70051","DOIUrl":null,"url":null,"abstract":"Aim: This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.Methods: Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).Results: Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.Conclusions: Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.","PeriodicalId":50488,"journal":{"name":"European Journal of Dental Education","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Human and Artificial Intelligence (AI) in Writing and Rating Restorative Dentistry Essays.\",\"authors\":\"Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil\",\"doi\":\"10.1111/eje.70051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aim: This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.Methods: Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).Results: Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.Conclusions: Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.\",\"PeriodicalId\":50488,\"journal\":{\"name\":\"European Journal of Dental Education\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Dental Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1111/eje.70051\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Dental Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1111/eje.70051","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究比较了人工智能和学生撰写的论文的写作质量，评估了教师区分人工智能和学生撰写的论文的可靠性，并使用标准化标准评估了人类和人工智能评分者之间的评分准确性。方法：选取临床前牙外科美容课程的4个课题。每个主题以四篇论文的形式呈现，由两名学生和两种人工智能工具（ChatGPT4和Gemini）撰写（N = 48）。然后，这16篇论文由三名盲法专家和两名人工智能工具评分者（ChatGPT4和Gemini）使用修改后的通用科学写作标准进行评估。夏皮罗-威尔克W检验评估数据正态性。Kruskal-Wallis、Dunn’s Pairwise、Wilcoxon Signed-rank和Friedman检验分析了写作表现和评分者间信度，显著性水平为（α = 0.05）。结果：人工智能和学生撰写的论文在评估科学内容（Z = 9.28, p = 0.005）和解释科学内容（Z = 6.74, p = 0.021）方面存在显著差异。ChatGPT4撰写的论文在科学内容上与双子座和学生撰写的论文有显著差异，在解释上与学生撰写的论文有进一步的差异（p = 0.011）。教职员工正确识别了75%的论文作者。在使用ChatGPT4或Gemini的评分者之间没有发现显著差异，而在人类评分者之间的总体评分有轻微显著差异，尽管在具体参数上没有差异。结论：Gemini的分数与人类撰写的论文非常接近，比ChatGPT-4更接近人类评分。人工智能的能力模仿了人类的写作，尽管对于训练有素的教师来说，差异是显而易见的。人工智能和人类创作的作品在内容质量和组织方面存在明显差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of Human and Artificial Intelligence (AI) in Writing and Rating Restorative Dentistry Essays.

Aim: This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.

Methods: Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).

Results: Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.

Conclusions: Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Journal of Dental Education 医学-学科教育

CiteScore

4.10

自引率

16.70%

发文量

127

审稿时长

6-12 weeks

期刊介绍： The aim of the European Journal of Dental Education is to publish original topical and review articles of the highest quality in the field of Dental Education. The Journal seeks to disseminate widely the latest information on curriculum development teaching methodologies assessment techniques and quality assurance in the fields of dental undergraduate and postgraduate education and dental auxiliary personnel training. The scope includes the dental educational aspects of the basic medical sciences the behavioural sciences the interface with medical education information technology and distance learning and educational audit. Papers embodying the results of high-quality educational research of relevance to dentistry are particularly encouraged as are evidence-based reports of novel and established educational programmes and their outcomes.