Dental Age Estimation from Panoramic Radiographs: A Comparison of Orthodontist and ChatGPT-4 Evaluations Using the London Atlas, Nolla, and Haavikko Methods.
{"title":"Dental Age Estimation from Panoramic Radiographs: A Comparison of Orthodontist and ChatGPT-4 Evaluations Using the London Atlas, Nolla, and Haavikko Methods.","authors":"Derya Dursun, Rumeysa Bilici Geçer","doi":"10.3390/diagnostics15182389","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Dental age (DA) estimation, which is widely used in orthodontics, pediatric dentistry, and forensic dentistry, predicts chronological age (CA) by assessing tooth development and maturation. Most methods rely on radiographic evaluation of tooth mineralization and eruption stages to assess DA. With the increasing adoption of large language models (LLMs) in medical sciences, use of ChatGPT has extended to processing visual data. The aim of this study, therefore, was to evaluate the performance of ChatGPT-4 in estimating DA from panoramic radiographs using three conventional methods (Nolla, Haavikko, and London Atlas) and to compare its accuracy against both orthodontist assessments and CA. <b>Methods:</b> In this retrospective study, panoramic radiographs of 511 Turkish children aged 6-17 years were assessed. DA was estimated using the Nolla, Haavikko, and London Atlas methods by both orthodontists and ChatGPT-4. The DA-CA difference and mean absolute error (MAE) were calculated, and statistical comparisons were performed to assess accuracy and sex differences and reach an agreement between the evaluators, with significance set at <i>p</i> < 0.05. <b>Results:</b> The mean CA of the study population was 12.37 ± 2.95 years (boys: 12.39 ± 2.94; girls: 12.35 ± 2.96). Using the London Atlas method, the orthodontists overestimated CA with a DA-CA difference of 0.78 ± 1.26 years (<i>p</i> < 0.001), whereas ChatGPT-4 showed no significant DA-CA difference (0.03 ± 0.93; <i>p</i> = 0.399). Using the Nolla method, the orthodontist showed no significant DA-CA difference (0.03 ± 1.14; <i>p</i> = 0.606), but ChatGPT-4 underestimated CA with a DA-CA difference of -0.40 ± 1.96 years (<i>p</i> < 0.001). Using the Haavikko method, the evaluators underestimated CA (orthodontist: -0.88; ChatGPT-4: -1.18; <i>p</i> < 0.001). The lowest MAE for ChatGPT-4 was obtained when using the London Atlas method (0.59 ± 0.72), followed by Nolla (1.33 ± 1.28) and Haavikko (1.51 ± 1.41). For the orthodontists, the lowest MAE was achieved when using the Nolla method (0.86 ± 0.75). Agreement between the orthodontists and ChatGPT-4 was highest when using the London Atlas method (ICC = 0.944, r = 0.905). <b>Conclusions:</b> ChatGPT-4 showed the highest accuracy with the London Atlas method, with no significant difference from CA for either sex or the lowest prediction error. When using the Nolla and Haavikko methods, both ChatGPT-4 and the orthodontist tended to underestimate age, with higher errors. Overall, ChatGPT-4 performed best when using visually guided methods and was less accurate when using multi-stage scoring methods.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 18","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468368/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15182389","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Dental age (DA) estimation, which is widely used in orthodontics, pediatric dentistry, and forensic dentistry, predicts chronological age (CA) by assessing tooth development and maturation. Most methods rely on radiographic evaluation of tooth mineralization and eruption stages to assess DA. With the increasing adoption of large language models (LLMs) in medical sciences, use of ChatGPT has extended to processing visual data. The aim of this study, therefore, was to evaluate the performance of ChatGPT-4 in estimating DA from panoramic radiographs using three conventional methods (Nolla, Haavikko, and London Atlas) and to compare its accuracy against both orthodontist assessments and CA. Methods: In this retrospective study, panoramic radiographs of 511 Turkish children aged 6-17 years were assessed. DA was estimated using the Nolla, Haavikko, and London Atlas methods by both orthodontists and ChatGPT-4. The DA-CA difference and mean absolute error (MAE) were calculated, and statistical comparisons were performed to assess accuracy and sex differences and reach an agreement between the evaluators, with significance set at p < 0.05. Results: The mean CA of the study population was 12.37 ± 2.95 years (boys: 12.39 ± 2.94; girls: 12.35 ± 2.96). Using the London Atlas method, the orthodontists overestimated CA with a DA-CA difference of 0.78 ± 1.26 years (p < 0.001), whereas ChatGPT-4 showed no significant DA-CA difference (0.03 ± 0.93; p = 0.399). Using the Nolla method, the orthodontist showed no significant DA-CA difference (0.03 ± 1.14; p = 0.606), but ChatGPT-4 underestimated CA with a DA-CA difference of -0.40 ± 1.96 years (p < 0.001). Using the Haavikko method, the evaluators underestimated CA (orthodontist: -0.88; ChatGPT-4: -1.18; p < 0.001). The lowest MAE for ChatGPT-4 was obtained when using the London Atlas method (0.59 ± 0.72), followed by Nolla (1.33 ± 1.28) and Haavikko (1.51 ± 1.41). For the orthodontists, the lowest MAE was achieved when using the Nolla method (0.86 ± 0.75). Agreement between the orthodontists and ChatGPT-4 was highest when using the London Atlas method (ICC = 0.944, r = 0.905). Conclusions: ChatGPT-4 showed the highest accuracy with the London Atlas method, with no significant difference from CA for either sex or the lowest prediction error. When using the Nolla and Haavikko methods, both ChatGPT-4 and the orthodontist tended to underestimate age, with higher errors. Overall, ChatGPT-4 performed best when using visually guided methods and was less accurate when using multi-stage scoring methods.
DiagnosticsBiochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍:
Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.