{"title":"Challenges and Limitations of Multimodal Large Language Models in Interpreting Pediatric Panoramic Radiographs.","authors":"Yuichi Mine, Yuko Iwamoto, Shota Okazaki, Taku Nishimura, Eimi Tabata, Saori Takeda, Tzu-Yu Peng, Ryota Nomura, Naoya Kakimoto, Takeshi Murayama","doi":"10.1111/ipd.70029","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain.</p><p><strong>Aim: </strong>This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs.</p><p><strong>Design: </strong>Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated.</p><p><strong>Results: </strong>Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs.</p><p><strong>Conclusion: </strong>Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.</p>","PeriodicalId":14268,"journal":{"name":"International journal of paediatric dentistry","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of paediatric dentistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ipd.70029","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain.
Aim: This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs.
Design: Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated.
Results: Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs.
Conclusion: Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.
期刊介绍:
The International Journal of Paediatric Dentistry was formed in 1991 by the merger of the Journals of the International Association of Paediatric Dentistry and the British Society of Paediatric Dentistry and is published bi-monthly. It has true international scope and aims to promote the highest standard of education, practice and research in paediatric dentistry world-wide.
International Journal of Paediatric Dentistry publishes papers on all aspects of paediatric dentistry including: growth and development, behaviour management, diagnosis, prevention, restorative treatment and issue relating to medically compromised children or those with disabilities. This peer-reviewed journal features scientific articles, reviews, case reports, clinical techniques, short communications and abstracts of current paediatric dental research. Analytical studies with a scientific novelty value are preferred to descriptive studies. Case reports illustrating unusual conditions and clinically relevant observations are acceptable but must be of sufficiently high quality to be considered for publication; particularly the illustrative material must be of the highest quality.