{"title":"Artificial intelligence in maxillofacial trauma: expert ally or unreliable assistant?","authors":"N Agbulut, M Unlu","doi":"10.4317/medoral.27229","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs), such as ChatGPT, have demonstrated potential in synthesizing complex clinical information, yet concerns persist regarding their accuracy and reliability in specialized domains. The rationale of this study is to address a gap in the literature by evaluating ChatGPT-4o's capabilities and limitations in terms of accuracy and reliability on oral and maxillofacial traumatology.</p><p><strong>Material and methods: </strong>A total of 188 oral and maxillofacial trauma-related questions were selected from a comprehensive resource. Thirty questions were randomly chosen and submitted to ChatGPT-4o resetting to \"new chat\" mode every repetition to eliminate potential memory bias. Accuracy was scored using a 3-point Likert scale. Reliability was assessed with weighted kappa (κ) and Intraclass Correlation Coefficient (ICC), and internal consistency was evaluated using both Cronbach's alpha (α) and McDonald's omega (ω).</p><p><strong>Results: </strong>The accuracy rates for comprehensive and adequate responses were calculated as 38% (95% CI: 32.5% - 43.5%) and 58% (95% CI: 52.1% - 63.3%), respectively. Weighted kappa (κ = 0.469) and ICC (0.503) indicated moderate reliability. Internal consistency metrics revealed excellent and good reliability, respectively (α = 0.904, ω = 0.860).</p><p><strong>Conclusions: </strong>ChatGPT-4o demonstrated promising results as an adjunct tool in providing supplementary educational content, verifying critical information, and supporting the decision-making processes in oral and maxillofacial traumatology. Current limitations warrant further research. Future enhancements in LLMs and prompt engineering may assist in the optimization of their clinical applicability and alignment with evidence-based standards.</p>","PeriodicalId":49016,"journal":{"name":"Medicina Oral Patologia Oral Y Cirugia Bucal","volume":" ","pages":"e751-e757"},"PeriodicalIF":2.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395565/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicina Oral Patologia Oral Y Cirugia Bucal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4317/medoral.27229","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Large language models (LLMs), such as ChatGPT, have demonstrated potential in synthesizing complex clinical information, yet concerns persist regarding their accuracy and reliability in specialized domains. The rationale of this study is to address a gap in the literature by evaluating ChatGPT-4o's capabilities and limitations in terms of accuracy and reliability on oral and maxillofacial traumatology.
Material and methods: A total of 188 oral and maxillofacial trauma-related questions were selected from a comprehensive resource. Thirty questions were randomly chosen and submitted to ChatGPT-4o resetting to "new chat" mode every repetition to eliminate potential memory bias. Accuracy was scored using a 3-point Likert scale. Reliability was assessed with weighted kappa (κ) and Intraclass Correlation Coefficient (ICC), and internal consistency was evaluated using both Cronbach's alpha (α) and McDonald's omega (ω).
Results: The accuracy rates for comprehensive and adequate responses were calculated as 38% (95% CI: 32.5% - 43.5%) and 58% (95% CI: 52.1% - 63.3%), respectively. Weighted kappa (κ = 0.469) and ICC (0.503) indicated moderate reliability. Internal consistency metrics revealed excellent and good reliability, respectively (α = 0.904, ω = 0.860).
Conclusions: ChatGPT-4o demonstrated promising results as an adjunct tool in providing supplementary educational content, verifying critical information, and supporting the decision-making processes in oral and maxillofacial traumatology. Current limitations warrant further research. Future enhancements in LLMs and prompt engineering may assist in the optimization of their clinical applicability and alignment with evidence-based standards.
期刊介绍:
1. Oral Medicine and Pathology:
Clinicopathological as well as medical or surgical management aspects of
diseases affecting oral mucosa, salivary glands, maxillary bones, as well as
orofacial neurological disorders, and systemic conditions with an impact on
the oral cavity.
2. Oral Surgery:
Surgical management aspects of diseases affecting oral mucosa, salivary glands,
maxillary bones, teeth, implants, oral surgical procedures. Surgical management
of diseases affecting head and neck areas.
3. Medically compromised patients in Dentistry:
Articles discussing medical problems in Odontology will also be included, with
a special focus on the clinico-odontological management of medically compromised patients, and considerations regarding high-risk or disabled patients.
4. Implantology
5. Periodontology