Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil
{"title":"Comparison of Human and Artificial Intelligence (AI) in Writing and Rating Restorative Dentistry Essays.","authors":"Afnan O Al-Zain, Abdulrahman A Alghamdi, Bashair Alansari, Alanoud Alamoudi, Heba El-Deeb, Eman H Isamil","doi":"10.1111/eje.70051","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.</p><p><strong>Methods: </strong>Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).</p><p><strong>Results: </strong>Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.</p><p><strong>Conclusions: </strong>Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.</p>","PeriodicalId":50488,"journal":{"name":"European Journal of Dental Education","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Dental Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1111/eje.70051","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Aim: This study compared the writing quality of AI- and student-authored essays, evaluated faculty members' reliability to differentiate between essays authored by AI and students, and assessed the scoring accuracy between human and AI raters using a standardised rubric.
Methods: Four topics were selected from a preclinical operative and aesthetic dentistry course. Each topic was presented as four essays authored by two students and two AI tools (ChatGPT4 and Gemini) (N = 48). Then, the 16 essays were evaluated either by three blinded experts and two AI tool raters (ChatGPT4 and Gemini) using a modified Universal Science Writing Rubric. The Shapiro-Wilk W test assessed data normality. Kruskal-Wallis, Dunn's Pairwise, Wilcoxon Signed-rank, and Friedman tests analysed the writing performances and inter-rater reliabilities with a significance level of (α = 0.05).
Results: Significant differences were found in evaluating scientific content (Z = 9.28, p = 0.005) and interpreting scientific content (Z = 6.74, p = 0.021) between AI- and student-authored essays. ChatGPT4-authored essays differed significantly in scientific content from both Gemini- and student-authored essays, with further differences in interpretation between ChatGPT4- and student-authored essays (p = 0.011). Faculty members correctly identified 75% of essay authors. No significant differences were found between raters using ChatGPT4 or Gemini, while a marginally significant difference was observed between human raters in the overall score, though not in specific parameters.
Conclusions: Gemini's scores closely matched those of human-authored essays, aligning more with human raters than ChatGPT-4. AI's capacity emulated human writing, though differences are noticeable to trained faculty members. There are evident disparities in content quality and organisation between AI- and human-authored work.
期刊介绍:
The aim of the European Journal of Dental Education is to publish original topical and review articles of the highest quality in the field of Dental Education. The Journal seeks to disseminate widely the latest information on curriculum development teaching methodologies assessment techniques and quality assurance in the fields of dental undergraduate and postgraduate education and dental auxiliary personnel training. The scope includes the dental educational aspects of the basic medical sciences the behavioural sciences the interface with medical education information technology and distance learning and educational audit. Papers embodying the results of high-quality educational research of relevance to dentistry are particularly encouraged as are evidence-based reports of novel and established educational programmes and their outcomes.