Heloise Torchin, Paula Dhiman, Pierre-Yves Ancel, Xavier Durrmeyer, Pierre-Henri Jarreau, Alexandra Nuytten, Patrick Truffert, Jennifer Zeitlin, Gary S Collins
{"title":"Early prediction of bronchopulmonary dysplasia: comparison of modelling methods, development and validation studies.","authors":"Heloise Torchin, Paula Dhiman, Pierre-Yves Ancel, Xavier Durrmeyer, Pierre-Henri Jarreau, Alexandra Nuytten, Patrick Truffert, Jennifer Zeitlin, Gary S Collins","doi":"10.1038/s41390-025-04170-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Machine-learning methods are gaining in popularity to predict medical events but their added value to other methods is still to be determined. We compared performances of clinical prediction models for bronchopulmonary dysplasia (BPD) or death in very preterm infants using logistic regression and random forests methods.</p><p><strong>Methods: </strong>Two population-based cohorts of very preterm infants were used: EPIPAGE-2 (France, 2011) for development and internal validation and EPICE (Europe, 2011) for external validation. Eligible infants were born before 30 weeks' gestation and admitted in neonatal units. BPD was defined as any respiratory support at 36 weeks postmenstrual age. Candidate predictors were available shortly after birth or at day 3. Logistic regression and random forest models performance was assessed in terms of discrimination (c-statistic) and calibration plots.</p><p><strong>Results: </strong>Prevalence of BPD/death was 32.1% (668/1923) in EPIPAGE-2 and 41.0% (1368/3335) in EPICE. At both time points, logistic regression and random forest models showed similar performance during internal validation. At birth, external validation in EPICE showed good discrimination (logistic regression model: c-statistics 0.81, 95% CI 0.80-0.83; random forest: 0.80, 95% CI 0.79-0.81) but both models underestimated the probability of BPD/death. Model performances were heterogeneous throughout European regions.</p><p><strong>Conclusions: </strong>Both modelling methods performed similarly to predict BPD/death shortly after birth in very preterm children.</p><p><strong>Impact: </strong>Whether machine-learning methods predict better short-term respiratory outcomes in very preterm infants than logistic regression models is debated. Random forest-based prediction models did not perform better than logistic regression to predict bronchopulmonary dysplasia or death shortly after birth in very preterm infants. Calibration performances varied among European countries. While offering the same performance, regression models are easier to understand, to disseminate and to apply to different populations.</p>","PeriodicalId":19829,"journal":{"name":"Pediatric Research","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41390-025-04170-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Machine-learning methods are gaining in popularity to predict medical events but their added value to other methods is still to be determined. We compared performances of clinical prediction models for bronchopulmonary dysplasia (BPD) or death in very preterm infants using logistic regression and random forests methods.
Methods: Two population-based cohorts of very preterm infants were used: EPIPAGE-2 (France, 2011) for development and internal validation and EPICE (Europe, 2011) for external validation. Eligible infants were born before 30 weeks' gestation and admitted in neonatal units. BPD was defined as any respiratory support at 36 weeks postmenstrual age. Candidate predictors were available shortly after birth or at day 3. Logistic regression and random forest models performance was assessed in terms of discrimination (c-statistic) and calibration plots.
Results: Prevalence of BPD/death was 32.1% (668/1923) in EPIPAGE-2 and 41.0% (1368/3335) in EPICE. At both time points, logistic regression and random forest models showed similar performance during internal validation. At birth, external validation in EPICE showed good discrimination (logistic regression model: c-statistics 0.81, 95% CI 0.80-0.83; random forest: 0.80, 95% CI 0.79-0.81) but both models underestimated the probability of BPD/death. Model performances were heterogeneous throughout European regions.
Conclusions: Both modelling methods performed similarly to predict BPD/death shortly after birth in very preterm children.
Impact: Whether machine-learning methods predict better short-term respiratory outcomes in very preterm infants than logistic regression models is debated. Random forest-based prediction models did not perform better than logistic regression to predict bronchopulmonary dysplasia or death shortly after birth in very preterm infants. Calibration performances varied among European countries. While offering the same performance, regression models are easier to understand, to disseminate and to apply to different populations.
背景:机器学习方法在预测医疗事件方面越来越受欢迎,但其对其他方法的附加价值仍有待确定。我们使用逻辑回归和随机森林方法比较了非常早产儿支气管肺发育不良(BPD)或死亡的临床预测模型的性能。方法:使用两个基于人群的极早产儿队列:EPIPAGE-2(法国,2011年)用于开发和内部验证,EPICE(欧洲,2011年)用于外部验证。符合条件的婴儿在妊娠30周前出生并入住新生儿病房。BPD定义为经后36周时的任何呼吸支持。候选预测因子在出生后不久或第3天可用。根据判别(c-统计)和校准图评估逻辑回归和随机森林模型的性能。结果:EPIPAGE-2组BPD/死亡率为32.1% (668/1923),EPICE组为41.0%(1368/3335)。在这两个时间点,逻辑回归和随机森林模型在内部验证中表现出相似的性能。出生时,EPICE的外部验证显示出良好的鉴别(逻辑回归模型:c统计量0.81,95% CI 0.80-0.83;随机森林:0.80,95% CI 0.79-0.81),但两个模型都低估了BPD/死亡的概率。在整个欧洲地区,模型的性能是不同的。结论:两种建模方法在预测极早产儿出生后不久的BPD/死亡方面表现相似。影响:机器学习方法是否能比逻辑回归模型更好地预测极早产儿的短期呼吸结果还存在争议。基于随机森林的预测模型在预测极早产儿支气管肺发育不良或出生后不久死亡方面的表现并不比逻辑回归更好。欧洲各国的校准性能各不相同。在提供相同性能的同时,回归模型更容易理解、传播和应用于不同的人群。
期刊介绍:
Pediatric Research publishes original papers, invited reviews, and commentaries on the etiologies of children''s diseases and
disorders of development, extending from molecular biology to epidemiology. Use of model organisms and in vitro techniques
relevant to developmental biology and medicine are acceptable, as are translational human studies