Optimizing public health management with predictive analytics: leveraging the power of random forest.

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data Pub Date : 2025-07-10 eCollection Date: 2025-01-01 DOI:10.3389/fdata.2025.1574683

Hongman Wang, Yifan Song, Hua Bi

{"title":"Optimizing public health management with predictive analytics: leveraging the power of random forest.","authors":"Hongman Wang, Yifan Song, Hua Bi","doi":"10.3389/fdata.2025.1574683","DOIUrl":null,"url":null,"abstract":"<p><p>Community health outcomes significantly impact older populations' wellbeing and quality of life. Traditional analytical methods often struggle to accurately predict health risks at the community level due to their inability to capture complex, non-linear relationships among various health determinants. This study employs a Random Forest Algorithm (RFA) to address this limitation and enhance the predictive modeling of community health outcomes. By leveraging ensemble learning techniques and multi-factor analysis, this study aims to identify and quantify the relative contributions of key health indicators to risk assessment. The study begins with comprehensive data collection from diverse health sources, followed by a systematic preprocessing stage, which includes resolving missing values, normalizing variables, and encoding categorical features. Using bootstrap sampling, multiple decision trees were trained on random subsets of health data, ensuring variability in the model learning. The trees grow to full depth and aggregate their predictions to enhance the accuracy. An out-of-bag (OOB) error estimation was applied to refine the model and provide unbiased performance assessments, ensuring robust generalization to unseen data. The proposed model effectively analyzes key health indicators, ranking the feature importance to determine the most influential predictors of health risks. Results indicate that RFA achieves an accuracy rate of 92%, outperforming conventional prediction methods in terms of precision and recall. These findings underscore the efficacy of Random Forest in identifying critical health risk factors, paving the way for targeted and data-driven public health management strategies and interventions tailored to older adults.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1574683"},"PeriodicalIF":2.4000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12286995/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2025.1574683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Community health outcomes significantly impact older populations' wellbeing and quality of life. Traditional analytical methods often struggle to accurately predict health risks at the community level due to their inability to capture complex, non-linear relationships among various health determinants. This study employs a Random Forest Algorithm (RFA) to address this limitation and enhance the predictive modeling of community health outcomes. By leveraging ensemble learning techniques and multi-factor analysis, this study aims to identify and quantify the relative contributions of key health indicators to risk assessment. The study begins with comprehensive data collection from diverse health sources, followed by a systematic preprocessing stage, which includes resolving missing values, normalizing variables, and encoding categorical features. Using bootstrap sampling, multiple decision trees were trained on random subsets of health data, ensuring variability in the model learning. The trees grow to full depth and aggregate their predictions to enhance the accuracy. An out-of-bag (OOB) error estimation was applied to refine the model and provide unbiased performance assessments, ensuring robust generalization to unseen data. The proposed model effectively analyzes key health indicators, ranking the feature importance to determine the most influential predictors of health risks. Results indicate that RFA achieves an accuracy rate of 92%, outperforming conventional prediction methods in terms of precision and recall. These findings underscore the efficacy of Random Forest in identifying critical health risk factors, paving the way for targeted and data-driven public health management strategies and interventions tailored to older adults.

查看原文本刊更多论文

利用预测分析优化公共卫生管理：利用随机森林的力量。

社区卫生结果对老年人口的福祉和生活质量产生重大影响。传统的分析方法往往难以准确预测社区一级的健康风险，因为它们无法捕捉各种健康决定因素之间复杂的非线性关系。本研究采用随机森林算法（RFA）来解决这一限制并增强社区健康结果的预测建模。利用集成学习技术和多因素分析，本研究旨在确定和量化关键健康指标对风险评估的相对贡献。该研究首先从不同的卫生来源收集全面的数据，然后是系统的预处理阶段，其中包括解决缺失值、规范化变量和编码分类特征。使用自举抽样，在健康数据的随机子集上训练多个决策树，确保模型学习的可变性。这些树长到最深处，汇总它们的预测以提高准确性。采用包外（OOB）误差估计来改进模型并提供无偏性能评估，确保对未知数据的鲁棒泛化。该模型有效地分析了关键健康指标，对特征重要性进行排序，以确定最具影响力的健康风险预测因子。结果表明，RFA预测准确率达到92%，在准确率和召回率方面均优于传统预测方法。这些发现强调了随机森林在识别关键健康风险因素方面的功效，为制定针对老年人的有针对性和数据驱动的公共卫生管理战略和干预措施铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊