{"title":"可解释的XGBoost模型识别特发性中枢性性早熟的女孩使用四个临床和影像学特征。","authors":"Lu Tian, Yan Zeng, Helin Zheng, Jinhua Cai","doi":"10.1186/s12902-025-01983-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The study aimed to develop interpretable machine learning models for the identification of idiopathic central precocious puberty (ICPP) in girls, without the need for the expensive and time-consuming gonadotropin-releasing hormone (GnRH) stimulation test, which is currently the gold standard for diagnosing ICPP.</p><p><strong>Methods: </strong>A total of 246 female paediatric patients who had secondary sexual characteristics before 8 years old and had taken a GnRH stimulation test were randomly divided into a training set (172 patients, 70%) and a validation set (74 patients, 30%). Characteristic parameters were extracted from easily available clinical data and were statistically analysed. The least absolute shrinkage and selection operator (LASSO) method was used to select essential characteristic parameters associated with ICPP and were used to construct logistic regression (LR) and five machine learning (ML) models, including support vector machine (SVM), Gaussian naive bayes (GaussianNB), extreme gradient boosting (XGBoost), random forest (RF), and k- nearest neighbor algorithm (kNN). Then, the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, false positive and negative values, Youden's index, accuracy, positive and negative likelihood ratios, calibration plots, and decision curve analysis (DCA) were used to evaluate the models' effectiveness. Finally, the shapley additive explanations (SHAP) package was used to interpret the best-performing model.</p><p><strong>Results: </strong>Four essential characteristic parameters, namely uterine volume, bone age/chronological age (BA/CA), basal follicle-stimulating hormone (FSH), and basal luteinizing hormone (LH), were selected using the LASSO method. Based on these characteristic parameters, the LR and five machine learning models achieved AUC values ranging from 0.72 to 0.96 in the training set and AUC values ranging from 0.65 to 0.90 in the validation set for diagnosing ICPP. Among the LR and five machine learning models, the XGBoost model demonstrated superior performance, achieving the highest AUC values, accuracy, specificity, and sensitivity in both the training and validation sets. Moreover, calibration plots and DCA confirmed that this model exhibited the best calibration and clinical utility.</p><p><strong>Conclusions: </strong>An accurate and interpretable ML-based model has been developed to aid clinicians in the diagnosis of ICPP, assisting in clinical decision-making.</p>","PeriodicalId":9152,"journal":{"name":"BMC Endocrine Disorders","volume":"25 1","pages":"159"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12219956/pdf/","citationCount":"0","resultStr":"{\"title\":\"Interpretable XGBoost model identifies idiopathic central precocious puberty in girls using four clinical and imaging features.\",\"authors\":\"Lu Tian, Yan Zeng, Helin Zheng, Jinhua Cai\",\"doi\":\"10.1186/s12902-025-01983-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The study aimed to develop interpretable machine learning models for the identification of idiopathic central precocious puberty (ICPP) in girls, without the need for the expensive and time-consuming gonadotropin-releasing hormone (GnRH) stimulation test, which is currently the gold standard for diagnosing ICPP.</p><p><strong>Methods: </strong>A total of 246 female paediatric patients who had secondary sexual characteristics before 8 years old and had taken a GnRH stimulation test were randomly divided into a training set (172 patients, 70%) and a validation set (74 patients, 30%). Characteristic parameters were extracted from easily available clinical data and were statistically analysed. The least absolute shrinkage and selection operator (LASSO) method was used to select essential characteristic parameters associated with ICPP and were used to construct logistic regression (LR) and five machine learning (ML) models, including support vector machine (SVM), Gaussian naive bayes (GaussianNB), extreme gradient boosting (XGBoost), random forest (RF), and k- nearest neighbor algorithm (kNN). Then, the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, false positive and negative values, Youden's index, accuracy, positive and negative likelihood ratios, calibration plots, and decision curve analysis (DCA) were used to evaluate the models' effectiveness. Finally, the shapley additive explanations (SHAP) package was used to interpret the best-performing model.</p><p><strong>Results: </strong>Four essential characteristic parameters, namely uterine volume, bone age/chronological age (BA/CA), basal follicle-stimulating hormone (FSH), and basal luteinizing hormone (LH), were selected using the LASSO method. Based on these characteristic parameters, the LR and five machine learning models achieved AUC values ranging from 0.72 to 0.96 in the training set and AUC values ranging from 0.65 to 0.90 in the validation set for diagnosing ICPP. Among the LR and five machine learning models, the XGBoost model demonstrated superior performance, achieving the highest AUC values, accuracy, specificity, and sensitivity in both the training and validation sets. Moreover, calibration plots and DCA confirmed that this model exhibited the best calibration and clinical utility.</p><p><strong>Conclusions: </strong>An accurate and interpretable ML-based model has been developed to aid clinicians in the diagnosis of ICPP, assisting in clinical decision-making.</p>\",\"PeriodicalId\":9152,\"journal\":{\"name\":\"BMC Endocrine Disorders\",\"volume\":\"25 1\",\"pages\":\"159\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12219956/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Endocrine Disorders\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12902-025-01983-4\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Endocrine Disorders","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12902-025-01983-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
Interpretable XGBoost model identifies idiopathic central precocious puberty in girls using four clinical and imaging features.
Background: The study aimed to develop interpretable machine learning models for the identification of idiopathic central precocious puberty (ICPP) in girls, without the need for the expensive and time-consuming gonadotropin-releasing hormone (GnRH) stimulation test, which is currently the gold standard for diagnosing ICPP.
Methods: A total of 246 female paediatric patients who had secondary sexual characteristics before 8 years old and had taken a GnRH stimulation test were randomly divided into a training set (172 patients, 70%) and a validation set (74 patients, 30%). Characteristic parameters were extracted from easily available clinical data and were statistically analysed. The least absolute shrinkage and selection operator (LASSO) method was used to select essential characteristic parameters associated with ICPP and were used to construct logistic regression (LR) and five machine learning (ML) models, including support vector machine (SVM), Gaussian naive bayes (GaussianNB), extreme gradient boosting (XGBoost), random forest (RF), and k- nearest neighbor algorithm (kNN). Then, the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, false positive and negative values, Youden's index, accuracy, positive and negative likelihood ratios, calibration plots, and decision curve analysis (DCA) were used to evaluate the models' effectiveness. Finally, the shapley additive explanations (SHAP) package was used to interpret the best-performing model.
Results: Four essential characteristic parameters, namely uterine volume, bone age/chronological age (BA/CA), basal follicle-stimulating hormone (FSH), and basal luteinizing hormone (LH), were selected using the LASSO method. Based on these characteristic parameters, the LR and five machine learning models achieved AUC values ranging from 0.72 to 0.96 in the training set and AUC values ranging from 0.65 to 0.90 in the validation set for diagnosing ICPP. Among the LR and five machine learning models, the XGBoost model demonstrated superior performance, achieving the highest AUC values, accuracy, specificity, and sensitivity in both the training and validation sets. Moreover, calibration plots and DCA confirmed that this model exhibited the best calibration and clinical utility.
Conclusions: An accurate and interpretable ML-based model has been developed to aid clinicians in the diagnosis of ICPP, assisting in clinical decision-making.
期刊介绍:
BMC Endocrine Disorders is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of endocrine disorders, as well as related molecular genetics, pathophysiology, and epidemiology.