Diagnostic Performance of ChatGPT-4o and DeepSeek-3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis.

IF 2.9 3区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE
Oral diseases Pub Date : 2025-07-01 DOI:10.1111/odi.70007
Fatma E A Hassanein, Ahmed El Barbary, Radwa R Hussein, Yousra Ahmed, Jylan El-Guindy, Susan Sarhan, Asmaa Abou-Bakr
{"title":"Diagnostic Performance of ChatGPT-4o and DeepSeek-3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis.","authors":"Fatma E A Hassanein, Ahmed El Barbary, Radwa R Hussein, Yousra Ahmed, Jylan El-Guindy, Susan Sarhan, Asmaa Abou-Bakr","doi":"10.1111/odi.70007","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>AI models like ChatGPT-4o and DeepSeek-3 show diagnostic promise, but their reliability in complex, image-based oral lesions remains unclear. This study aimed to evaluate and compare the diagnostic accuracy of ChatGPT-4o and DeepSeek-3 despite their differing modalities against oral medicine (OM) experts across varied lesion types and case difficulty levels.</p><p><strong>Methods: </strong>Eighty standardized clinical vignettes derived from real-world oral disease cases, including clinical images/radiographs, were evaluated. Differential diagnoses were generated by ChatGPT-4o, DeepSeek-3, and four board-certified OM specialists, with accuracy assessed at Top-1, Top-3, and Top-5 levels.</p><p><strong>Results: </strong>OM specialists consistently achieved the highest diagnostic accuracy. However, DeepSeek-3 significantly outperformed ChatGPT-4o at the Top-3 level (p = 0.0153) and showed greater robustness in high-difficulty and inflammatory cases despite its text-only modality. Multimodal imaging enhanced diagnostic accuracy. Regression analysis indicated lesion type and imaging modality as positive predictors, while diagnostic difficulty negatively impacted Top-1 performance.</p><p><strong>Conclusions: </strong>Remarkably, the text-only DeepSeek-3 model exceeded the diagnostic performance of the multimodal ChatGPT-4o model for complex oral lesions, highlighting its structured reasoning capabilities and reduced hallucination rate. These findings underscore the potential of non-vision LLMs in diagnostic support, emphasizing the critical need for expert oversight in complex scenarios.</p>","PeriodicalId":19615,"journal":{"name":"Oral diseases","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oral diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/odi.70007","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Background: AI models like ChatGPT-4o and DeepSeek-3 show diagnostic promise, but their reliability in complex, image-based oral lesions remains unclear. This study aimed to evaluate and compare the diagnostic accuracy of ChatGPT-4o and DeepSeek-3 despite their differing modalities against oral medicine (OM) experts across varied lesion types and case difficulty levels.

Methods: Eighty standardized clinical vignettes derived from real-world oral disease cases, including clinical images/radiographs, were evaluated. Differential diagnoses were generated by ChatGPT-4o, DeepSeek-3, and four board-certified OM specialists, with accuracy assessed at Top-1, Top-3, and Top-5 levels.

Results: OM specialists consistently achieved the highest diagnostic accuracy. However, DeepSeek-3 significantly outperformed ChatGPT-4o at the Top-3 level (p = 0.0153) and showed greater robustness in high-difficulty and inflammatory cases despite its text-only modality. Multimodal imaging enhanced diagnostic accuracy. Regression analysis indicated lesion type and imaging modality as positive predictors, while diagnostic difficulty negatively impacted Top-1 performance.

Conclusions: Remarkably, the text-only DeepSeek-3 model exceeded the diagnostic performance of the multimodal ChatGPT-4o model for complex oral lesions, highlighting its structured reasoning capabilities and reduced hallucination rate. These findings underscore the potential of non-vision LLMs in diagnostic support, emphasizing the critical need for expert oversight in complex scenarios.

chatgpt - 40和DeepSeek-3鉴别诊断复杂口腔病变的诊断性能:多模态成像和病例难度分析。
背景:chatgpt - 40和DeepSeek-3等人工智能模型显示出诊断前景,但它们在复杂的、基于图像的口腔病变中的可靠性尚不清楚。本研究旨在评估和比较chatgpt - 40和DeepSeek-3的诊断准确性,尽管它们在不同病变类型和病例难度级别上对口腔医学(OM)专家的诊断方式不同。方法:对来自真实世界口腔疾病病例的80个标准化临床小片段进行评估,包括临床图像/ x线片。鉴别诊断由chatgpt - 40、DeepSeek-3和四位董事会认证的OM专家生成,准确性评估为Top-1、Top-3和Top-5级别。结果:OM专家始终达到最高的诊断准确性。然而,DeepSeek-3在Top-3水平上显著优于chatgpt - 40 (p = 0.0153),尽管采用纯文本模式,但在高难度和炎症病例中表现出更强的稳健性。多模态成像提高了诊断准确性。回归分析显示,病变类型和影像学方式是正向预测因素,而诊断难度对Top-1的表现有负向影响。结论:值得注意的是,纯文本DeepSeek-3模型对复杂口腔病变的诊断性能超过了多模态chatgpt - 40模型,突出了其结构化推理能力和降低的幻觉率。这些发现强调了非视觉llm在诊断支持方面的潜力,强调了在复杂情况下专家监督的关键需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Oral diseases
Oral diseases 医学-牙科与口腔外科
CiteScore
7.60
自引率
5.30%
发文量
325
审稿时长
4-8 weeks
期刊介绍: Oral Diseases is a multidisciplinary and international journal with a focus on head and neck disorders, edited by leaders in the field, Professor Giovanni Lodi (Editor-in-Chief, Milan, Italy), Professor Stefano Petti (Deputy Editor, Rome, Italy) and Associate Professor Gulshan Sunavala-Dossabhoy (Deputy Editor, Shreveport, LA, USA). The journal is pre-eminent in oral medicine. Oral Diseases specifically strives to link often-isolated areas of dentistry and medicine through broad-based scholarship that includes well-designed and controlled clinical research, analytical epidemiology, and the translation of basic science in pre-clinical studies. The journal typically publishes articles relevant to many related medical specialties including especially dermatology, gastroenterology, hematology, immunology, infectious diseases, neuropsychiatry, oncology and otolaryngology. The essential requirement is that all submitted research is hypothesis-driven, with significant positive and negative results both welcomed. Equal publication emphasis is placed on etiology, pathogenesis, diagnosis, prevention and treatment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信