{"title":"人工智能在牙外伤急诊实时救治中的应用:临床验证研究","authors":"Nadav Grinberg, Shimrit Arbel, Yana Yarden Boyadjiev, Clariel Ianculovici, Shlomi Kleinman, Oren Peleg","doi":"10.1111/edt.70022","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds.</p><p><strong>Methods: </strong>Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level.</p><p><strong>Results: </strong>ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ<sup>2</sup> = 6.73, p = 0.009), with one in eight potentially unsafe situations.</p><p><strong>Conclusions: </strong>This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the \"time-critical safety\" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.</p>","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Performance of Artificial Intelligence in Providing Real-Time Aid in Emergency Dental Trauma: A Clinical Validation Study.\",\"authors\":\"Nadav Grinberg, Shimrit Arbel, Yana Yarden Boyadjiev, Clariel Ianculovici, Shlomi Kleinman, Oren Peleg\",\"doi\":\"10.1111/edt.70022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds.</p><p><strong>Methods: </strong>Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level.</p><p><strong>Results: </strong>ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ<sup>2</sup> = 6.73, p = 0.009), with one in eight potentially unsafe situations.</p><p><strong>Conclusions: </strong>This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the \\\"time-critical safety\\\" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.</p>\",\"PeriodicalId\":55180,\"journal\":{\"name\":\"Dental Traumatology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dental Traumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/edt.70022\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.70022","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
The Performance of Artificial Intelligence in Providing Real-Time Aid in Emergency Dental Trauma: A Clinical Validation Study.
Background: Searching online for dental emergency treatment as a non-expert can lead to unreliable guidance. We tested the publicly available first multimodal large-language model, ChatGPT-4o, prospectively with real emergency-department avulsion cases to determine if it would deliver guideline-correct, time-critical directions within seconds.
Methods: Seventy-eight anonymized avulsion charts (42 permanent, 36 primary teeth; 39 dry, 39 moist; 40 immature roots) were rewritten as lay prompts. ChatGPT-4o created two single responses to each vignette, 14 days apart (156 responses). Three oral and maxillofacial surgeons (OMFS) scored diagnostic accuracy, immediate action, contraindication identification, and completeness. Three lay assessors scored clarity (0-15 composite rating). An additional time-critical safety flag required simultaneous accuracy in immediate action and contraindication advice. Statistical analysis was performed at a 95% confidence level.
Results: ChatGPT-4o demonstrated significant rates of accurate guidance. Inter-rater reproducibility was near perfect (ICC = 0.94; κ = 0.88-0.998). The median composite score was 13 (IQR 12-14); permanent dentition elevated the probability for perfect diagnostic, contraindication, and immediate-action scores (p ≤ 0.046), but extra-oral dry time lowered immediate-action (p = 0.003) and reduced completeness (p = 0.023). Root maturity had no effect. Clarity was rated at more than 93% in both sessions. The safety flag was present in 81% and 89% of cases (χ2 = 6.73, p = 0.009), with one in eight potentially unsafe situations.
Conclusions: This first clinical validation of ChatGPT-4o demonstrates expert-level, reproducible triage for tooth avulsion and introduces the "time-critical safety" composite as a strict benchmark for emergency chatbots. There is still a need for guideline-linked retrieval before unsupervised deployment. Clinically, these findings show that while ChatGPT can offer quick and largely accurate advice, the remaining deficiencies highlight the risk of incomplete or unsafe guidance during emergencies.
期刊介绍:
Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics:
- Epidemiology, Social Aspects, Education, Diagnostics
- Esthetics / Prosthetics/ Restorative
- Evidence Based Traumatology & Study Design
- Oral & Maxillofacial Surgery/Transplant/Implant
- Pediatrics and Orthodontics
- Prevention and Sports Dentistry
- Endodontics and Periodontal Aspects
The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.