Proof-of-concept comparison of an artificial intelligence-based bone age assessment tool with Greulich-Pyle and Tanner-Whitehouse version 2 methods in a pediatric cohort.
Luca Marinelli, Antonio Lo Mastro, Francesca Grassi, Daniela Berritto, Anna Russo, Vittorio Patanè, Anna Festa, Enrico Grassi, Anna Grandone, Luigi Aurelio Nasto, Enrico Pola, Alfonso Reginelli
{"title":"Proof-of-concept comparison of an artificial intelligence-based bone age assessment tool with Greulich-Pyle and Tanner-Whitehouse version 2 methods in a pediatric cohort.","authors":"Luca Marinelli, Antonio Lo Mastro, Francesca Grassi, Daniela Berritto, Anna Russo, Vittorio Patanè, Anna Festa, Enrico Grassi, Anna Grandone, Luigi Aurelio Nasto, Enrico Pola, Alfonso Reginelli","doi":"10.1007/s00247-025-06405-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bone age assessment is essential in evaluating pediatric growth disorders. Artificial intelligence (AI) systems offer potential improvements in accuracy and reproducibility compared to traditional methods.</p><p><strong>Objective: </strong>To compare the performance of a commercially available artificial intelligence-based software (BoneView BoneAge, Gleamer, Paris, France) against two human-assessed methods-the Greulich-Pyle (GP) atlas and Tanner-Whitehouse version 2 (TW2)-in a pediatric population.</p><p><strong>Materials and methods: </strong>This proof-of-concept study included 203 pediatric patients (mean age, 9.0 years; range, 2.0-17.0 years) who underwent hand and wrist radiographs for suspected endocrine or growth-related conditions. After excluding technically inadequate images, 157 cases were analyzed using AI and GP-assessed methods. A subset of 35 patients was also evaluated using the TW2 method by a pediatric endocrinologist. Performance was measured using mean absolute error (MAE), root mean square error (RMSE), bias, and Pearson's correlation coefficient, using chronological age as reference.</p><p><strong>Results: </strong>The AI model achieved a MAE of 1.38 years, comparable to the radiologist's GP-based estimate (MAE, 1.30 years), and superior to TW2 (MAE, 2.86 years). RMSE values were 1.75 years, 1.80 years, and 3.88 years, respectively. AI showed minimal bias (-0.05 years), while TW2-based assessments systematically underestimated bone age (bias, -2.63 years). Strong correlations with chronological age were observed for AI (r=0.857) and GP (r=0.894), but not for TW2 (r=0.490).</p><p><strong>Conclusion: </strong>BoneView demonstrated comparable accuracy to radiologist-assessed GP method and outperformed TW2 assessments in this cohort. AI-based systems may enhance consistency in pediatric bone age estimation but require careful validation, especially in ethnically diverse populations.</p>","PeriodicalId":19755,"journal":{"name":"Pediatric Radiology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00247-025-06405-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Bone age assessment is essential in evaluating pediatric growth disorders. Artificial intelligence (AI) systems offer potential improvements in accuracy and reproducibility compared to traditional methods.
Objective: To compare the performance of a commercially available artificial intelligence-based software (BoneView BoneAge, Gleamer, Paris, France) against two human-assessed methods-the Greulich-Pyle (GP) atlas and Tanner-Whitehouse version 2 (TW2)-in a pediatric population.
Materials and methods: This proof-of-concept study included 203 pediatric patients (mean age, 9.0 years; range, 2.0-17.0 years) who underwent hand and wrist radiographs for suspected endocrine or growth-related conditions. After excluding technically inadequate images, 157 cases were analyzed using AI and GP-assessed methods. A subset of 35 patients was also evaluated using the TW2 method by a pediatric endocrinologist. Performance was measured using mean absolute error (MAE), root mean square error (RMSE), bias, and Pearson's correlation coefficient, using chronological age as reference.
Results: The AI model achieved a MAE of 1.38 years, comparable to the radiologist's GP-based estimate (MAE, 1.30 years), and superior to TW2 (MAE, 2.86 years). RMSE values were 1.75 years, 1.80 years, and 3.88 years, respectively. AI showed minimal bias (-0.05 years), while TW2-based assessments systematically underestimated bone age (bias, -2.63 years). Strong correlations with chronological age were observed for AI (r=0.857) and GP (r=0.894), but not for TW2 (r=0.490).
Conclusion: BoneView demonstrated comparable accuracy to radiologist-assessed GP method and outperformed TW2 assessments in this cohort. AI-based systems may enhance consistency in pediatric bone age estimation but require careful validation, especially in ethnically diverse populations.
期刊介绍:
Official Journal of the European Society of Pediatric Radiology, the Society for Pediatric Radiology and the Asian and Oceanic Society for Pediatric Radiology
Pediatric Radiology informs its readers of new findings and progress in all areas of pediatric imaging and in related fields. This is achieved by a blend of original papers, complemented by reviews that set out the present state of knowledge in a particular area of the specialty or summarize specific topics in which discussion has led to clear conclusions. Advances in technology, methodology, apparatus and auxiliary equipment are presented, and modifications of standard techniques are described.
Manuscripts submitted for publication must contain a statement to the effect that all human studies have been reviewed by the appropriate ethics committee and have therefore been performed in accordance with the ethical standards laid down in an appropriate version of the 1964 Declaration of Helsinki. It should also be stated clearly in the text that all persons gave their informed consent prior to their inclusion in the study. Details that might disclose the identity of the subjects under study should be omitted.