Chang Hoon Han, Seok-Jae Heo, Haerin Jang, So-Yeon Lee, Ji Soo Park, Dong In Suh, Youn Ho Shin, Jihyun Kim, Kangmo Ahn, Myung Hyun Sohn, Eom Ji Choi, Sun Hee Choi, Hey-Sung Baek, Soo-Jong Hong, Kyung Won Kim, Inkyung Jung, Soo Yeon Kim
{"title":"Machine learning-based early prediction of asthma in preschoolers: The COCOA birth cohort study.","authors":"Chang Hoon Han, Seok-Jae Heo, Haerin Jang, So-Yeon Lee, Ji Soo Park, Dong In Suh, Youn Ho Shin, Jihyun Kim, Kangmo Ahn, Myung Hyun Sohn, Eom Ji Choi, Sun Hee Choi, Hey-Sung Baek, Soo-Jong Hong, Kyung Won Kim, Inkyung Jung, Soo Yeon Kim","doi":"10.1111/pai.70223","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early prediction of asthma in preschoolers, which is crucial for timely intervention, remains challenging. This study aimed to develop a machine learning (ML)-based model and a questionnaire-based scoring tool for the prediction of asthma at age 3 years.</p><p><strong>Methods: </strong>Data from the COhort for Childhood Origin of Asthma and allergic diseases (COCOA), a comprehensive prospective birth cohort in South Korea, was used. Children with complete 3-year follow-up (n = 2007) were divided into development (n = 1472) and validation (n = 535) cohorts based on birth year. Asthma diagnosis at age 3 years was based on physician diagnosis, recurrent wheezing episodes, asthma treatment, or parental reports. Random Forest-based predictive models were developed using data collected until the age of 2 years, initially selecting features via least absolute shrinkage and selection operator (LASSO) regression. A questionnaire-based scoring tool was also developed and compared with multiple ML algorithms.</p><p><strong>Results: </strong>The ML-based prediction models showed improved performance as the data accumulated. The 6-month, 1-year, and 2-year models had area under the receiver operating characteristic curve (AUROC) values of 0.614, 0.726, and 0.774, respectively, in the validation cohort. The performance of the questionnaire-based scoring tool (AUROC, 0.790) was comparable to that of the ML-based model. Important predictors included paternal total IgE levels, maternal iron supplementation during pregnancy, parental asthma history, nut allergy history, and recent lower respiratory infections.</p><p><strong>Conclusions: </strong>Our study successfully developed robust predictive models for early asthma that demonstrated high performance. The questionnaire-based scoring tool offers particular value because of its clinical applicability. Further validation in diverse populations and investigation of the causative pathways of the identified predictors are necessary to enhance clinical utility.</p>","PeriodicalId":520742,"journal":{"name":"Pediatric allergy and immunology : official publication of the European Society of Pediatric Allergy and Immunology","volume":"36 10","pages":"e70223"},"PeriodicalIF":4.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12533341/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric allergy and immunology : official publication of the European Society of Pediatric Allergy and Immunology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/pai.70223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Early prediction of asthma in preschoolers, which is crucial for timely intervention, remains challenging. This study aimed to develop a machine learning (ML)-based model and a questionnaire-based scoring tool for the prediction of asthma at age 3 years.
Methods: Data from the COhort for Childhood Origin of Asthma and allergic diseases (COCOA), a comprehensive prospective birth cohort in South Korea, was used. Children with complete 3-year follow-up (n = 2007) were divided into development (n = 1472) and validation (n = 535) cohorts based on birth year. Asthma diagnosis at age 3 years was based on physician diagnosis, recurrent wheezing episodes, asthma treatment, or parental reports. Random Forest-based predictive models were developed using data collected until the age of 2 years, initially selecting features via least absolute shrinkage and selection operator (LASSO) regression. A questionnaire-based scoring tool was also developed and compared with multiple ML algorithms.
Results: The ML-based prediction models showed improved performance as the data accumulated. The 6-month, 1-year, and 2-year models had area under the receiver operating characteristic curve (AUROC) values of 0.614, 0.726, and 0.774, respectively, in the validation cohort. The performance of the questionnaire-based scoring tool (AUROC, 0.790) was comparable to that of the ML-based model. Important predictors included paternal total IgE levels, maternal iron supplementation during pregnancy, parental asthma history, nut allergy history, and recent lower respiratory infections.
Conclusions: Our study successfully developed robust predictive models for early asthma that demonstrated high performance. The questionnaire-based scoring tool offers particular value because of its clinical applicability. Further validation in diverse populations and investigation of the causative pathways of the identified predictors are necessary to enhance clinical utility.