Bayesian additive regression trees for predicting childhood asthma in the CHILD cohort study.

IF 3.9 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES

BMC Medical Research Methodology Pub Date : 2024-11-01 DOI:10.1186/s12874-024-02376-2

Mojtaba Ahmadiankalati, Himani Boury, Padmaja Subbarao, Wendy Lou, Zihang Lu

{"title":"Bayesian additive regression trees for predicting childhood asthma in the CHILD cohort study.","authors":"Mojtaba Ahmadiankalati, Himani Boury, Padmaja Subbarao, Wendy Lou, Zihang Lu","doi":"10.1186/s12874-024-02376-2","DOIUrl":null,"url":null,"abstract":"Background: Asthma is a heterogeneous disease that affects millions of children and adults. There is a lack of objective gold standard diagnosis that spans the ages; instead, diagnoses are made by clinician assessment based on a cluster of signs, symptoms and objective tests dependent on age. Yet, there is a clear morbidity associated with chronic asthma symptoms. Machine learning has become a popular tool to improve asthma diagnosis and classification. There is a paucity of literature on the use of Bayesian machine learning algorithms to predict asthma diagnosis in children. This paper develops a prediction model using the Bayesian additive regression trees (BART) and compares its performance to various machine learning algorithms in predicting the diagnosis of childhood asthma.Methods: Clinically relevant variables collected at or before 3 years of age from 2794 participants in the CHILD Cohort Study were used to predict physician-diagnosed asthma at age 5. BART and six other commonly used machine learning algorithms, namely adaptive boosting, logistic regression, decision tree, neural network, random forest, and support vector machine were trained. Measures of performance including sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve were calculated. The confidence intervals were calculated using Bootstrapping samples. Important predictors and interaction effects associated with asthma were also identified using BART.Results: BART, logistic regression and random forest showed the highest area under the ROC curve compared to other machine learning algorithms. Based on BART, recurrent wheeze, respiratory infection and food sensitization at 3 years of age were the most important predictors. The three most important interaction effects were found to be interaction terms of respiratory infection at 3 years and recurrent wheezing at 3 years, maternal asthma and paternal asthma, and maternal wheezing and inhalant sensitization of child at 3 years.Conclusions: BART demonstrated promising prediction performance when compared to other machine learning algorithms. Future research could validate the BART in an external cohort to evaluate its reliability and generalizability.","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"262"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529447/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02376-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Asthma is a heterogeneous disease that affects millions of children and adults. There is a lack of objective gold standard diagnosis that spans the ages; instead, diagnoses are made by clinician assessment based on a cluster of signs, symptoms and objective tests dependent on age. Yet, there is a clear morbidity associated with chronic asthma symptoms. Machine learning has become a popular tool to improve asthma diagnosis and classification. There is a paucity of literature on the use of Bayesian machine learning algorithms to predict asthma diagnosis in children. This paper develops a prediction model using the Bayesian additive regression trees (BART) and compares its performance to various machine learning algorithms in predicting the diagnosis of childhood asthma.

Methods: Clinically relevant variables collected at or before 3 years of age from 2794 participants in the CHILD Cohort Study were used to predict physician-diagnosed asthma at age 5. BART and six other commonly used machine learning algorithms, namely adaptive boosting, logistic regression, decision tree, neural network, random forest, and support vector machine were trained. Measures of performance including sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve were calculated. The confidence intervals were calculated using Bootstrapping samples. Important predictors and interaction effects associated with asthma were also identified using BART.

Results: BART, logistic regression and random forest showed the highest area under the ROC curve compared to other machine learning algorithms. Based on BART, recurrent wheeze, respiratory infection and food sensitization at 3 years of age were the most important predictors. The three most important interaction effects were found to be interaction terms of respiratory infection at 3 years and recurrent wheezing at 3 years, maternal asthma and paternal asthma, and maternal wheezing and inhalant sensitization of child at 3 years.

Conclusions: BART demonstrated promising prediction performance when compared to other machine learning algorithms. Future research could validate the BART in an external cohort to evaluate its reliability and generalizability.

查看原文本刊更多论文

贝叶斯加法回归树用于预测 CHILD 队列研究中的儿童哮喘。

背景：哮喘是一种影响数百万儿童和成人的异质性疾病。目前缺乏跨年龄段的客观金标准诊断；相反，诊断是由临床医生根据一系列体征、症状和与年龄相关的客观测试进行评估得出的。然而，与慢性哮喘症状相关的发病率非常明显。机器学习已成为改善哮喘诊断和分类的流行工具。关于使用贝叶斯机器学习算法预测儿童哮喘诊断的文献还很少。本文利用贝叶斯加性回归树（BART）建立了一个预测模型，并比较了它与各种机器学习算法在预测儿童哮喘诊断方面的性能：方法：我们利用从 CHILD 队列研究的 2794 名参与者中收集到的 3 岁时或 3 岁前的临床相关变量来预测 5 岁时医生诊断的哮喘。对 BART 和其他六种常用的机器学习算法（即自适应提升、逻辑回归、决策树、神经网络、随机森林和支持向量机）进行了训练。计算了灵敏度、特异性和接收者操作特征曲线（ROC）下面积等性能指标。使用 Bootstrapping 样本计算置信区间。还使用 BART 确定了与哮喘相关的重要预测因素和交互效应：结果：与其他机器学习算法相比，BART、逻辑回归和随机森林的 ROC 曲线下面积最大。根据 BART，反复喘息、呼吸道感染和 3 岁时食物过敏是最重要的预测因素。3岁时呼吸道感染与3岁时反复喘息、母亲哮喘与父亲哮喘、母亲喘息与3岁时儿童吸入物过敏的交互项是三个最重要的交互效应：与其他机器学习算法相比，BART具有良好的预测性能。未来的研究可以在外部队列中验证 BART，以评估其可靠性和可推广性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Research Methodology 医学-卫生保健

CiteScore

6.50

自引率

2.50%

发文量

298

审稿时长

3-8 weeks

期刊介绍： BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.