Heng Zhang, Fei Wang, Ou Jiang, Yilin Lin, Lianfang Tang, Ziwei Li, Rui Ba, Xiaoyan Xu, Hongying Mi
{"title":"Machine learning-based risk prediction models for bronchopulmonary dysplasia in preterm infants: a high-altitude cohort study.","authors":"Heng Zhang, Fei Wang, Ou Jiang, Yilin Lin, Lianfang Tang, Ziwei Li, Rui Ba, Xiaoyan Xu, Hongying Mi","doi":"10.1136/bmjpo-2025-003652","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Bronchopulmonary dysplasia (BPD) is a significant cause of morbidity in preterm infants, yet its development and severity at high altitudes (>1500 m) remain poorly understood. This study aimed to identify altitude-specific risk factors and develop robust, interpretable predictive models for BPD in this unique population.</p><p><strong>Methods: </strong>In this retrospective matched cohort study, 378 preterm infants (<32 weeks gestation, <1500 g birth weight) admitted to a high-altitude (1500 m) NICU(Neonatal Intensive Care Unit) between 2019 and 2023 were analysed. The cohort included 189 BPD cases (91 mild, 61 moderate, 37 severe) and 189 matched controls. Maternal, perinatal and postnatal data were collected. Machine learning models (XGBoost, logistic regression, random forest) were developed and rigorously evaluated using comprehensive performance metrics to predict BPD occurrence and severity. SHAP (SHapley Additive exPlanations) analysis was employed to interpret the best-performing model.</p><p><strong>Results: </strong>Key risk factors for BPD development included maternal hypertension (OR 2.31, 95% CI 1.56 to 3.42), initial oxygen requirement >30% (OR 3.15, 95% CI 2.13 to 4.65) and lack of exclusive breast milk feeding (OR 1.89, 95% CI 1.28 to 2.79). Severe BPD was independently associated with prolonged invasive ventilation (>7 days) (OR 4.12, 95% CI 2.78 to 6.11), elevated C reactive protein (>10 mg/L) (OR 2.87, 95% CI 1.93 to 4.26) and patent ductus arteriosus (OR 2.53, 95% CI 1.71 to 3.74). Machine learning models demonstrated strong predictive performance; the optimal XGBoost model achieved an area under the curve of 0.89 (95% CI 0.85 to 0.93), an F1 score of 0.82, a Matthews Correlation Coefficient of 0.73 and a balanced accuracy of 0.85. SHAP analysis identified initial FiO2 >30%, mechanical ventilation and maternal hypertension as the top three most influential predictors for the XGBoost model.</p><p><strong>Conclusions: </strong>This study provides the first comprehensive analysis of BPD risk factors at a specific high altitude and validates effective, interpretable machine learning models for its prediction. These findings highlight the critical importance of altitude-specific adjustments in risk assessment and emphasise the potential for model-guided early interventions to improve outcomes for this vulnerable population.</p>","PeriodicalId":9069,"journal":{"name":"BMJ Paediatrics Open","volume":"9 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12258350/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Paediatrics Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjpo-2025-003652","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Bronchopulmonary dysplasia (BPD) is a significant cause of morbidity in preterm infants, yet its development and severity at high altitudes (>1500 m) remain poorly understood. This study aimed to identify altitude-specific risk factors and develop robust, interpretable predictive models for BPD in this unique population.
Methods: In this retrospective matched cohort study, 378 preterm infants (<32 weeks gestation, <1500 g birth weight) admitted to a high-altitude (1500 m) NICU(Neonatal Intensive Care Unit) between 2019 and 2023 were analysed. The cohort included 189 BPD cases (91 mild, 61 moderate, 37 severe) and 189 matched controls. Maternal, perinatal and postnatal data were collected. Machine learning models (XGBoost, logistic regression, random forest) were developed and rigorously evaluated using comprehensive performance metrics to predict BPD occurrence and severity. SHAP (SHapley Additive exPlanations) analysis was employed to interpret the best-performing model.
Results: Key risk factors for BPD development included maternal hypertension (OR 2.31, 95% CI 1.56 to 3.42), initial oxygen requirement >30% (OR 3.15, 95% CI 2.13 to 4.65) and lack of exclusive breast milk feeding (OR 1.89, 95% CI 1.28 to 2.79). Severe BPD was independently associated with prolonged invasive ventilation (>7 days) (OR 4.12, 95% CI 2.78 to 6.11), elevated C reactive protein (>10 mg/L) (OR 2.87, 95% CI 1.93 to 4.26) and patent ductus arteriosus (OR 2.53, 95% CI 1.71 to 3.74). Machine learning models demonstrated strong predictive performance; the optimal XGBoost model achieved an area under the curve of 0.89 (95% CI 0.85 to 0.93), an F1 score of 0.82, a Matthews Correlation Coefficient of 0.73 and a balanced accuracy of 0.85. SHAP analysis identified initial FiO2 >30%, mechanical ventilation and maternal hypertension as the top three most influential predictors for the XGBoost model.
Conclusions: This study provides the first comprehensive analysis of BPD risk factors at a specific high altitude and validates effective, interpretable machine learning models for its prediction. These findings highlight the critical importance of altitude-specific adjustments in risk assessment and emphasise the potential for model-guided early interventions to improve outcomes for this vulnerable population.