Johanna Pape, Maciej Rosolowski, Roland Pfäffle, Anne B Beeskow, Daniel Gräfe
{"title":"A critical comparative study of the performance of three AI-assisted programs for bone age determination.","authors":"Johanna Pape, Maciej Rosolowski, Roland Pfäffle, Anne B Beeskow, Daniel Gräfe","doi":"10.1007/s00330-024-11169-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To date, AI-supported programs for bone age (BA) determination for medical use in Europe have almost only been validated separately, according to Greulich and Pyle (G&P). Therefore, the current study aimed to compare the performance of three programs, namely BoneXpert, PANDA, and BoneView, on a single Central European population.</p><p><strong>Materials and methods: </strong>For this retrospective study, hand radiographs of 306 children aged 1-18 years, stratified by gender and age, were included. A subgroup consisting of the age group accounting for 90% of examinations in clinical practice was formed. The G&P BA was estimated by three human experts-as ground truth-and three AI-supported programs. The mean absolute deviation, the root mean squared error (RMSE), and dropouts by the AI were calculated.</p><p><strong>Results: </strong>The correlation between all programs and the ground truth was prominent (R<sup>2</sup> ≥ 0.98). In the total group, BoneXpert had a lower RMSE than BoneView and PANDA (0.62 vs. 0.65 and 0.75 years) with a dropout rate of 2.3%, 20.3% and 0%, respectively. In the subgroup, there was less difference in RMSE (0.66 vs. 0.68 and 0.65 years, max. 4% dropouts). The standard deviation between the AI readers was lower than that between the human readers (0.54 vs. 0.62 years, p < 0.01).</p><p><strong>Conclusion: </strong>All three AI programs predict BA after G&P in the main age range with similar high reliability. Differences arise at the boundaries of childhood.</p><p><strong>Key points: </strong>Question There is a lack of comparative, independent validation for artificial intelligence-based bone age estimation in children. Findings Three commercially available programs estimate bone age after Greulich and Pyle with similarly high reliability in a central European cohort. Clinical relevance The comparative study will help the reader choose a software for bone age estimation approved for the European market depending on the targeted age group and economic considerations.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11169-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: To date, AI-supported programs for bone age (BA) determination for medical use in Europe have almost only been validated separately, according to Greulich and Pyle (G&P). Therefore, the current study aimed to compare the performance of three programs, namely BoneXpert, PANDA, and BoneView, on a single Central European population.
Materials and methods: For this retrospective study, hand radiographs of 306 children aged 1-18 years, stratified by gender and age, were included. A subgroup consisting of the age group accounting for 90% of examinations in clinical practice was formed. The G&P BA was estimated by three human experts-as ground truth-and three AI-supported programs. The mean absolute deviation, the root mean squared error (RMSE), and dropouts by the AI were calculated.
Results: The correlation between all programs and the ground truth was prominent (R2 ≥ 0.98). In the total group, BoneXpert had a lower RMSE than BoneView and PANDA (0.62 vs. 0.65 and 0.75 years) with a dropout rate of 2.3%, 20.3% and 0%, respectively. In the subgroup, there was less difference in RMSE (0.66 vs. 0.68 and 0.65 years, max. 4% dropouts). The standard deviation between the AI readers was lower than that between the human readers (0.54 vs. 0.62 years, p < 0.01).
Conclusion: All three AI programs predict BA after G&P in the main age range with similar high reliability. Differences arise at the boundaries of childhood.
Key points: Question There is a lack of comparative, independent validation for artificial intelligence-based bone age estimation in children. Findings Three commercially available programs estimate bone age after Greulich and Pyle with similarly high reliability in a central European cohort. Clinical relevance The comparative study will help the reader choose a software for bone age estimation approved for the European market depending on the targeted age group and economic considerations.
期刊介绍:
European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field.
This is the Journal of the European Society of Radiology, and the official journal of a number of societies.
From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.