Machine learning for detection of diffusion abnormalities-related respiratory changes among normal, overweight, and obese individuals based on BMI and pulmonary ventilation parameters: an observational study.
{"title":"Machine learning for detection of diffusion abnormalities-related respiratory changes among normal, overweight, and obese individuals based on BMI and pulmonary ventilation parameters: an observational study.","authors":"Xin-Yue Song, Xin-Peng Xie, Wen-Jing Xu, Yu-Jia Cao, Bin-Miao Liang","doi":"10.1186/s12911-025-03064-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The integration of machine learning (ML) algorithms enables the detection of diffusion abnormalities-related respiratory changes in individuals with normal body mass index (BMI), overweight, and obesity based on BMI and pulmonary ventilation parameters. We evaluated the effectiveness of various supervised ML algorithms and identified the optimal configurations for these applications.</p><p><strong>Methods: </strong>We conducted a retrospective analysis of data from 440 individuals who underwent pulmonary function tests between January 1, 2021, and April 1, 2024. This cohort consisted of 287 individuals with normal diffusion capacity (DN) and 153 with diffusion abnormalities (DA). We employed statistical comparisons (e.g., independent samples t-test and Chi-square test) to analyze demographic characteristics and spirometry results. Piecewise regression evaluated the correlation between BMI and carbon monoxide diffusing capacity (DL<sub>CO</sub>). Pulmonary ventilation parameters included forced vital capacity (FVC), forced expiratory volume in one second (FEV<sub>1</sub>), FEV<sub>1</sub>/FVC, peak expiratory flow (PEF), maximum mid-expiratory flow (MMEF) and vital capacity (VC). We applied several supervised ML algorithms and feature selection strategies to distinguish between DN and DA, including Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), Naive Bayes (BAYES), K-Nearest Neighbors (KNN), SelectKBest, Recursive Feature Elimination with Cross-Validation (RFECV), and SelectFromModel. Additionally, we performed feature importance analysis using shapley additive explanations (SHAP) and permutation importance to evaluate the contribution of individual parameters to the classification process.</p><p><strong>Results: </strong>Our findings revealed that individuals in the DA group demonstrated lower PEF and DL<sub>CO</sub> than their DN counterparts. BMI displayed a cubic relationship with DL<sub>CO</sub> for 18.5 kg/m² < BMI < 40 kg/m² (R² = 0.498, P < 0.01), and a linear negative correlation for BMI ≥ 40 kg/m² (r = -0.253, P < 0.05). Notably, the RF algorithm emerged as the most effective diagnostic tool for distinguishing between DN and DA, achieving an area under the curve (AUC) of 0.983, considerably outpacing other algorithms like BAYES, SVM, AdaBoost, and KNN (P < 0.01). Applying various feature selection strategies identified optimal parameters (BMI, FEV<sub>1</sub>/FVC, and VC) in subsequent experiments, which aligned with the results from feature importance analysis and pulmonary physiology. While feature selection enhanced KNN's diagnostic accuracy, it had a minimal impact on BAYES's performance.</p><p><strong>Conclusion: </strong>The results indicate that for individuals with a BMI between 18.5 kg/m² and 40 kg/m², diffusion capacity improves with increasing BMI. Conversely, diffusion capacity decreases for those with a BMI of 40 kg/m² or higher. This study underscores the potential of combining BMI and pulmonary ventilation parameters with ML algorithms as a practical approach to diagnosing diffusion abnormalities across normal-weight, overweight, and obese categories, particularly in contexts utilizing portable spirometers.</p><p><strong>Trial registration: </strong>Not applicable.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"240"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220551/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03064-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The integration of machine learning (ML) algorithms enables the detection of diffusion abnormalities-related respiratory changes in individuals with normal body mass index (BMI), overweight, and obesity based on BMI and pulmonary ventilation parameters. We evaluated the effectiveness of various supervised ML algorithms and identified the optimal configurations for these applications.
Methods: We conducted a retrospective analysis of data from 440 individuals who underwent pulmonary function tests between January 1, 2021, and April 1, 2024. This cohort consisted of 287 individuals with normal diffusion capacity (DN) and 153 with diffusion abnormalities (DA). We employed statistical comparisons (e.g., independent samples t-test and Chi-square test) to analyze demographic characteristics and spirometry results. Piecewise regression evaluated the correlation between BMI and carbon monoxide diffusing capacity (DLCO). Pulmonary ventilation parameters included forced vital capacity (FVC), forced expiratory volume in one second (FEV1), FEV1/FVC, peak expiratory flow (PEF), maximum mid-expiratory flow (MMEF) and vital capacity (VC). We applied several supervised ML algorithms and feature selection strategies to distinguish between DN and DA, including Support Vector Machine (SVM), Random Forest (RF), Adaptive Boosting (AdaBoost), Naive Bayes (BAYES), K-Nearest Neighbors (KNN), SelectKBest, Recursive Feature Elimination with Cross-Validation (RFECV), and SelectFromModel. Additionally, we performed feature importance analysis using shapley additive explanations (SHAP) and permutation importance to evaluate the contribution of individual parameters to the classification process.
Results: Our findings revealed that individuals in the DA group demonstrated lower PEF and DLCO than their DN counterparts. BMI displayed a cubic relationship with DLCO for 18.5 kg/m² < BMI < 40 kg/m² (R² = 0.498, P < 0.01), and a linear negative correlation for BMI ≥ 40 kg/m² (r = -0.253, P < 0.05). Notably, the RF algorithm emerged as the most effective diagnostic tool for distinguishing between DN and DA, achieving an area under the curve (AUC) of 0.983, considerably outpacing other algorithms like BAYES, SVM, AdaBoost, and KNN (P < 0.01). Applying various feature selection strategies identified optimal parameters (BMI, FEV1/FVC, and VC) in subsequent experiments, which aligned with the results from feature importance analysis and pulmonary physiology. While feature selection enhanced KNN's diagnostic accuracy, it had a minimal impact on BAYES's performance.
Conclusion: The results indicate that for individuals with a BMI between 18.5 kg/m² and 40 kg/m², diffusion capacity improves with increasing BMI. Conversely, diffusion capacity decreases for those with a BMI of 40 kg/m² or higher. This study underscores the potential of combining BMI and pulmonary ventilation parameters with ML algorithms as a practical approach to diagnosing diffusion abnormalities across normal-weight, overweight, and obese categories, particularly in contexts utilizing portable spirometers.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.