Diagnostic Accuracy of a Commercial AI-based Platform in Evaluating Endodontic Treatment Outcomes on Periapical Radiographs Using CBCT as the Reference Standard.
Marwa Allihaibi, Garrit Koller, Francesco Mannocci
{"title":"Diagnostic Accuracy of a Commercial AI-based Platform in Evaluating Endodontic Treatment Outcomes on Periapical Radiographs Using CBCT as the Reference Standard.","authors":"Marwa Allihaibi, Garrit Koller, Francesco Mannocci","doi":"10.1016/j.joen.2025.03.007","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence (AI) has shown promise in dental diagnostics; however, its accuracy in assessing endodontic treatment outcomes compared to experienced clinicians remains unclear. This study evaluated the performance of an AI-driven platform (Diagnocat) against experienced clinicians in assessing endodontic treatment outcomes on periapical radiographs, using cone-beam computed tomography as the reference standard.</p><p><strong>Methods: </strong>This retrospective diagnostic accuracy study analyzed 376 teeth (860 roots) from 4 prospective clinical trials. Treatment outcomes were assessed using periapical radiographs, independently evaluated by 2 calibrated endodontists and the AI-driven platform. Cone-beam computed tomography scans served as the reference standard. Sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve were calculated.</p><p><strong>Results: </strong>The AI-driven platform demonstrated higher sensitivity but lower specificity than clinicians at both tooth (sensitivity: 67.3% vs 49.3%, P < .001; specificity: 82.3% vs 92.5%, P < .001) and root levels (sensitivity: 54.3% vs 43.8%, P = .003; specificity: 86.7% vs 94.5%, P < .001). Overall accuracy was comparable at the tooth level (AI: 76.3%, clinicians: 75.3%, P = .716) but slightly lower for the AI-driven platform at the root level (78.5% vs 81.6%, P = .021). Receiver operating characteristic curve analysis showed comparable area under the curve values between the AI-driven platform and clinicians at both tooth (0.75 vs 0.71) and root levels (0.71 vs 0.69).</p><p><strong>Conclusions: </strong>While the AI-driven platform demonstrated potential as an adjunctive tool for assessing endodontic treatment outcomes, particularly in detecting lesions that might be missed by human assessment, its lower specificity highlights the need for clinical oversight to prevent overdiagnosis.</p>","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":" ","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.joen.2025.03.007","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Artificial intelligence (AI) has shown promise in dental diagnostics; however, its accuracy in assessing endodontic treatment outcomes compared to experienced clinicians remains unclear. This study evaluated the performance of an AI-driven platform (Diagnocat) against experienced clinicians in assessing endodontic treatment outcomes on periapical radiographs, using cone-beam computed tomography as the reference standard.
Methods: This retrospective diagnostic accuracy study analyzed 376 teeth (860 roots) from 4 prospective clinical trials. Treatment outcomes were assessed using periapical radiographs, independently evaluated by 2 calibrated endodontists and the AI-driven platform. Cone-beam computed tomography scans served as the reference standard. Sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve were calculated.
Results: The AI-driven platform demonstrated higher sensitivity but lower specificity than clinicians at both tooth (sensitivity: 67.3% vs 49.3%, P < .001; specificity: 82.3% vs 92.5%, P < .001) and root levels (sensitivity: 54.3% vs 43.8%, P = .003; specificity: 86.7% vs 94.5%, P < .001). Overall accuracy was comparable at the tooth level (AI: 76.3%, clinicians: 75.3%, P = .716) but slightly lower for the AI-driven platform at the root level (78.5% vs 81.6%, P = .021). Receiver operating characteristic curve analysis showed comparable area under the curve values between the AI-driven platform and clinicians at both tooth (0.75 vs 0.71) and root levels (0.71 vs 0.69).
Conclusions: While the AI-driven platform demonstrated potential as an adjunctive tool for assessing endodontic treatment outcomes, particularly in detecting lesions that might be missed by human assessment, its lower specificity highlights the need for clinical oversight to prevent overdiagnosis.
人工智能(AI)在牙科诊断方面已经显示出前景,但是与经验丰富的临床医生相比,人工智能在评估牙髓治疗结果方面的准确性尚不清楚。本研究使用锥束计算机断层扫描(CBCT)作为参考标准,评估了人工智能驱动平台(诊断)与经验丰富的临床医生在评估根尖周x线片上牙髓治疗效果方面的表现。方法:本回顾性诊断准确性研究分析了来自四项前瞻性临床试验的376颗牙齿(860根)。使用根尖周x线片评估治疗结果,由两名校准的牙髓医生和人工智能驱动平台独立评估。CBCT扫描作为参考标准。计算灵敏度、特异度、准确度和受试者工作特征曲线下面积(AUC-ROC)。结果:人工智能驱动的平台在两颗牙齿上的敏感性高于临床医生,但特异性低于临床医生(敏感性:67.3% vs 49.3%)。结论:虽然人工智能驱动的平台显示出作为评估根管治疗结果的辅助工具的潜力,特别是在检测可能被人类评估遗漏的病变方面,但其较低的特异性突出了临床监督的必要性,以防止过度诊断。
期刊介绍:
The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.