多模态大语言模型在解读儿童全景x光片中的挑战和局限性。

IF 1.9 3区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Yuichi Mine, Yuko Iwamoto, Shota Okazaki, Taku Nishimura, Eimi Tabata, Saori Takeda, Tzu-Yu Peng, Ryota Nomura, Naoya Kakimoto, Takeshi Murayama
{"title":"多模态大语言模型在解读儿童全景x光片中的挑战和局限性。","authors":"Yuichi Mine, Yuko Iwamoto, Shota Okazaki, Taku Nishimura, Eimi Tabata, Saori Takeda, Tzu-Yu Peng, Ryota Nomura, Naoya Kakimoto, Takeshi Murayama","doi":"10.1111/ipd.70029","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain.</p><p><strong>Aim: </strong>This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs.</p><p><strong>Design: </strong>Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated.</p><p><strong>Results: </strong>Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs.</p><p><strong>Conclusion: </strong>Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.</p>","PeriodicalId":14268,"journal":{"name":"International journal of paediatric dentistry","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Challenges and Limitations of Multimodal Large Language Models in Interpreting Pediatric Panoramic Radiographs.\",\"authors\":\"Yuichi Mine, Yuko Iwamoto, Shota Okazaki, Taku Nishimura, Eimi Tabata, Saori Takeda, Tzu-Yu Peng, Ryota Nomura, Naoya Kakimoto, Takeshi Murayama\",\"doi\":\"10.1111/ipd.70029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain.</p><p><strong>Aim: </strong>This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs.</p><p><strong>Design: </strong>Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated.</p><p><strong>Results: </strong>Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs.</p><p><strong>Conclusion: </strong>Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.</p>\",\"PeriodicalId\":14268,\"journal\":{\"name\":\"International journal of paediatric dentistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of paediatric dentistry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/ipd.70029\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of paediatric dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ipd.70029","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

背景:多模态大语言模型(llm)具有医学图像分析的潜力,但其在儿科全景x线片上的可靠性仍不确定。目的:评价两种多模态llm (OpenAI 01和Claude 3.5 Sonnet)在儿童全景x线片上检测和计数牙齿(包括牙齿细菌)的效果。设计:对来自开源数据集的87张儿科全景x线片进行分析。两名儿科牙医注释了每个潜在牙齿位置的存在或缺失。每个图像由llm使用相同的提示处理五次,并将结果与专家注释进行比较。计算标准性能指标和Fleiss kappa。结果:详细的检查显示,细微的发育阶段和轻微的牙齿脱落一直被误诊。Claude 3.5 Sonnet的敏感性较高,但特异性明显较低(29.8%±21.5%),导致假阳性较多。与Claude 3.5 Sonnet相比,OpenAI 01表现出更高的特异性,但在某些混合牙列病例中仍未能正确检测出细微缺陷。两个模型在重复运行中都显示出很大的变异性。结论:两种LLMs都未能达到临床可接受的性能,并且不能可靠地识别对儿科牙科至关重要的细微差异。在常规临床使用之前,进一步的改进和一致性改进是必不可少的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Challenges and Limitations of Multimodal Large Language Models in Interpreting Pediatric Panoramic Radiographs.

Background: Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain.

Aim: This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs.

Design: Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated.

Results: Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs.

Conclusion: Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.50
自引率
2.60%
发文量
82
审稿时长
6-12 weeks
期刊介绍: The International Journal of Paediatric Dentistry was formed in 1991 by the merger of the Journals of the International Association of Paediatric Dentistry and the British Society of Paediatric Dentistry and is published bi-monthly. It has true international scope and aims to promote the highest standard of education, practice and research in paediatric dentistry world-wide. International Journal of Paediatric Dentistry publishes papers on all aspects of paediatric dentistry including: growth and development, behaviour management, diagnosis, prevention, restorative treatment and issue relating to medically compromised children or those with disabilities. This peer-reviewed journal features scientific articles, reviews, case reports, clinical techniques, short communications and abstracts of current paediatric dental research. Analytical studies with a scientific novelty value are preferred to descriptive studies. Case reports illustrating unusual conditions and clinically relevant observations are acceptable but must be of sufficiently high quality to be considered for publication; particularly the illustrative material must be of the highest quality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信