人工智能在牙外伤急诊实时救治中的应用：临床验证研究

IF 3.1 3区医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE

Dental Traumatology Pub Date : 2025-09-30 DOI:10.1111/edt.70022

Nadav Grinberg, Shimrit Arbel, Yana Yarden Boyadjiev, Clariel Ianculovici, Shlomi Kleinman, Oren Peleg

{"title":"人工智能在牙外伤急诊实时救治中的应用：临床验证研究","authors":"Nadav Grinberg, Shimrit Arbel, Yana Yarden Boyadjiev, Clariel Ianculovici, Shlomi Kleinman, Oren Peleg","doi":"10.1111/edt.70022","DOIUrl":null,"url":null,"abstract":"Background: Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds.Methods: Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level.Results: ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ2 = 6.73, p = 0.009), with one in eight potentially unsafe situations.Conclusions: This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the \"time-critical safety\" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Performance of Artificial Intelligence in Providing Real-Time Aid in Emergency Dental Trauma: A Clinical Validation Study.\",\"authors\":\"Nadav Grinberg, Shimrit Arbel, Yana Yarden Boyadjiev, Clariel Ianculovici, Shlomi Kleinman, Oren Peleg\",\"doi\":\"10.1111/edt.70022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds.Methods: Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level.Results: ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ2 = 6.73, p = 0.009), with one in eight potentially unsafe situations.Conclusions: This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the \\\"time-critical safety\\\" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.\",\"PeriodicalId\":55180,\"journal\":{\"name\":\"Dental Traumatology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dental Traumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/edt.70022\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.70022","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

摘要

背景：作为一个非专家在网上搜索牙科急诊治疗可能导致不可靠的指导。我们测试了公开可用的第一个多模态大语言模型chatgpt - 40，并对真实的急诊科撕脱病例进行了前瞻性测试，以确定它是否能在几秒钟内提供指导正确的、时间紧迫的指示。方法：将78张匿名撕脱图（42颗恒牙、36颗乳牙、39颗干牙根、39颗湿牙根、40颗未成熟牙根）改写为铺贴提示。chatgpt - 40为每个小插曲创建了两个单独的回复，间隔14天（156个回复）。三名口腔颌面外科医生（OMFS）获得了诊断准确性、立即行动、禁忌症识别和完整性的评分。三位外行评估员对清晰度进行评分（综合评分0-15）。另一个时间紧迫的安全标志要求立即行动和禁忌症建议同时准确。统计分析在95%的置信水平上进行。结果：chatgpt - 40显示出显著的准确引导率。评分间重现性接近完美（ICC = 0.94; κ = 0.88-0.998）。中位综合评分为13分（IQR 12-14）；恒牙列提高了完美诊断、禁忌症和立即行动评分的概率（p≤0.046），但口外干燥时间降低了立即行动（p = 0.003）和完全性（p = 0.023）。根系成熟度无显著影响。在两次会议上，清晰度的评分都超过93%。81%和89%的病例存在安全标志（χ2 = 6.73, p = 0.009），其中八分之一存在潜在的不安全情况。结论：chatgpt - 40的首次临床验证证明了专家级别的、可重复的牙齿撕脱分诊，并引入了“时间关键安全”复合材料作为紧急聊天机器人的严格基准。在无监督部署之前，仍然需要与指南链接的检索。在临床上，这些发现表明，虽然ChatGPT可以提供快速和大部分准确的建议，但其余的缺陷突出了紧急情况下不完整或不安全指导的风险。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Performance of Artificial Intelligence in Providing Real-Time Aid in Emergency Dental Trauma: A Clinical Validation Study.

Background: Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds.

Methods: Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level.

Results: ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ² = 6.73, p = 0.009), with one in eight potentially unsafe situations.

Conclusions: This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the "time-critical safety" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Dental Traumatology 医学-牙科与口腔外科

CiteScore

6.40

自引率

32.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics: - Epidemiology, Social Aspects, Education, Diagnostics - Esthetics / Prosthetics/ Restorative - Evidence Based Traumatology & Study Design - Oral & Maxillofacial Surgery/Transplant/Implant - Pediatrics and Orthodontics - Prevention and Sports Dentistry - Endodontics and Periodontal Aspects The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.