使用不同牙齿编号系统的视觉语言模型在全景x光片上检测常见牙齿状况的性能。

IF 3.3 3区医学 Q1 MEDICINE, GENERAL & INTERNAL

Diagnostics Pub Date : 2025-09-12 DOI:10.3390/diagnostics15182315

Zekai Liu, Qi Yong H Ai, Andy Wai Kan Yeung, Ray Tanaka, Andrew Nalley, Kuo Feng Hung

{"title":"使用不同牙齿编号系统的视觉语言模型在全景x光片上检测常见牙齿状况的性能。","authors":"Zekai Liu, Qi Yong H Ai, Andy Wai Kan Yeung, Ray Tanaka, Andrew Nalley, Kuo Feng Hung","doi":"10.3390/diagnostics15182315","DOIUrl":null,"url":null,"abstract":"Objectives: The aim of this study was to evaluate the performance of GPT-4o in identifying nine common dental conditions on panoramic radiographs, both overall and at specific tooth sites, and to assess whether the use of different tooth numbering systems (FDI and Universal) in prompts would affect its diagnostic accuracy. Methods: Fifty panoramic radiographs exhibiting various common dental conditions including missing teeth, impacted teeth, caries, endodontically treated teeth, teeth with restorations, periapical lesions, periodontal bone loss, tooth fractures, cracks, retained roots, dental implants, osteolytic lesions, and osteosclerosis were included. Each image was evaluated twice by GPT-4o in May 2025, using structured prompts based on either the FDI or Universal tooth numbering system, to identify the presence of these conditions at specific tooth sites or regions. GPT-4o responses were compared to a consensus reference standard established by an oral-maxillofacial radiology team. GPT-4o's performance was evaluated using balanced accuracy, sensitivity, specificity, and F1 score both at the patient and tooth levels. Results: A total of 100 GPT-4o responses were generated. At the patient level, balanced accuracy ranged from 46.25% to 98.83% (FDI) and 49.75% to 92.86% (Universal), with the highest accuracies for dental implants (92.86-98.83%). F1-scores and sensitivities were highest for implants, missing, and impacted teeth, but zero for caries, periapical lesions, and fractures. Specificity was generally high across conditions. Notable discrepancies were observed between patient- and tooth-level performance, especially for implants and restorations. GPT-4o's performance was similar between using the two numbering systems. Conclusions: GPT-4o demonstrated superior performance in detecting dental implants and treated or restored teeth but inferior performance for caries, periapical lesions, and fractures. Diagnostic accuracy was higher at the patient level than at the tooth level, with similar performances for both numbering systems. Future studies with larger, more diverse datasets and multiple models are needed.","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 18","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468776/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of a Vision-Language Model in Detecting Common Dental Conditions on Panoramic Radiographs Using Different Tooth Numbering Systems.\",\"authors\":\"Zekai Liu, Qi Yong H Ai, Andy Wai Kan Yeung, Ray Tanaka, Andrew Nalley, Kuo Feng Hung\",\"doi\":\"10.3390/diagnostics15182315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: The aim of this study was to evaluate the performance of GPT-4o in identifying nine common dental conditions on panoramic radiographs, both overall and at specific tooth sites, and to assess whether the use of different tooth numbering systems (FDI and Universal) in prompts would affect its diagnostic accuracy. Methods: Fifty panoramic radiographs exhibiting various common dental conditions including missing teeth, impacted teeth, caries, endodontically treated teeth, teeth with restorations, periapical lesions, periodontal bone loss, tooth fractures, cracks, retained roots, dental implants, osteolytic lesions, and osteosclerosis were included. Each image was evaluated twice by GPT-4o in May 2025, using structured prompts based on either the FDI or Universal tooth numbering system, to identify the presence of these conditions at specific tooth sites or regions. GPT-4o responses were compared to a consensus reference standard established by an oral-maxillofacial radiology team. GPT-4o's performance was evaluated using balanced accuracy, sensitivity, specificity, and F1 score both at the patient and tooth levels. Results: A total of 100 GPT-4o responses were generated. At the patient level, balanced accuracy ranged from 46.25% to 98.83% (FDI) and 49.75% to 92.86% (Universal), with the highest accuracies for dental implants (92.86-98.83%). F1-scores and sensitivities were highest for implants, missing, and impacted teeth, but zero for caries, periapical lesions, and fractures. Specificity was generally high across conditions. Notable discrepancies were observed between patient- and tooth-level performance, especially for implants and restorations. GPT-4o's performance was similar between using the two numbering systems. Conclusions: GPT-4o demonstrated superior performance in detecting dental implants and treated or restored teeth but inferior performance for caries, periapical lesions, and fractures. Diagnostic accuracy was higher at the patient level than at the tooth level, with similar performances for both numbering systems. Future studies with larger, more diverse datasets and multiple models are needed.\",\"PeriodicalId\":11225,\"journal\":{\"name\":\"Diagnostics\",\"volume\":\"15 18\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468776/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/diagnostics15182315\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15182315","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究的目的是评估gpt - 40在全景x线片上识别九种常见牙齿疾病的性能，包括整体和特定牙齿部位，并评估在提示中使用不同的牙齿编号系统（FDI和Universal）是否会影响其诊断准确性。方法：包括50张全景x线片，显示各种常见的牙齿疾病，包括缺牙、阻生牙、龋齿、经牙髓治疗的牙齿、修复的牙齿、根尖周围病变、牙周骨丢失、牙齿骨折、裂缝、牙根保留、种植体、溶骨病变和骨硬化。2025年5月，gpt - 40使用基于FDI或通用牙齿编号系统的结构化提示对每张图像进行两次评估，以确定特定牙齿部位或区域是否存在这些情况。将gpt - 40反应与口腔颌面放射学团队建立的共识参考标准进行比较。gpt - 40的性能在患者和牙齿水平上使用平衡的准确性、敏感性、特异性和F1评分进行评估。结果：共产生100例gpt - 40应答。在患者水平上，FDI的平衡准确率为46.25% ~ 98.83%，Universal的平衡准确率为49.75% ~ 92.86%，其中种植体的平衡准确率最高（92.86 ~ 98.83%）。植牙、缺牙和阻生牙的f1评分和敏感性最高，而龋齿、根尖周病变和骨折的f1评分和敏感性为零。在不同条件下特异性普遍较高。在患者和牙齿水平的表现之间观察到明显的差异，特别是在种植体和修复体方面。gpt - 40的性能在使用两种编号系统之间是相似的。结论：gpt - 40在检测种植体和治疗或修复牙齿方面表现优异，但在龋齿、根尖周病变和骨折方面表现较差。诊断准确性在患者水平高于在牙齿水平，具有相似的性能为两个编号系统。未来的研究需要更大、更多样化的数据集和多种模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance of a Vision-Language Model in Detecting Common Dental Conditions on Panoramic Radiographs Using Different Tooth Numbering Systems.

Objectives: The aim of this study was to evaluate the performance of GPT-4o in identifying nine common dental conditions on panoramic radiographs, both overall and at specific tooth sites, and to assess whether the use of different tooth numbering systems (FDI and Universal) in prompts would affect its diagnostic accuracy. Methods: Fifty panoramic radiographs exhibiting various common dental conditions including missing teeth, impacted teeth, caries, endodontically treated teeth, teeth with restorations, periapical lesions, periodontal bone loss, tooth fractures, cracks, retained roots, dental implants, osteolytic lesions, and osteosclerosis were included. Each image was evaluated twice by GPT-4o in May 2025, using structured prompts based on either the FDI or Universal tooth numbering system, to identify the presence of these conditions at specific tooth sites or regions. GPT-4o responses were compared to a consensus reference standard established by an oral-maxillofacial radiology team. GPT-4o's performance was evaluated using balanced accuracy, sensitivity, specificity, and F1 score both at the patient and tooth levels. Results: A total of 100 GPT-4o responses were generated. At the patient level, balanced accuracy ranged from 46.25% to 98.83% (FDI) and 49.75% to 92.86% (Universal), with the highest accuracies for dental implants (92.86-98.83%). F1-scores and sensitivities were highest for implants, missing, and impacted teeth, but zero for caries, periapical lesions, and fractures. Specificity was generally high across conditions. Notable discrepancies were observed between patient- and tooth-level performance, especially for implants and restorations. GPT-4o's performance was similar between using the two numbering systems. Conclusions: GPT-4o demonstrated superior performance in detecting dental implants and treated or restored teeth but inferior performance for caries, periapical lesions, and fractures. Diagnostic accuracy was higher at the patient level than at the tooth level, with similar performances for both numbering systems. Future studies with larger, more diverse datasets and multiple models are needed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Diagnostics Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry

CiteScore

4.70

自引率

8.30%

发文量

2699

审稿时长

19.64 days

期刊介绍： Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.