COVID-19 from symptoms to prediction: A statistical and machine learning approach

IF 7 2区 医学 Q1 BIOLOGY
{"title":"COVID-19 from symptoms to prediction: A statistical and machine learning approach","authors":"","doi":"10.1016/j.compbiomed.2024.109211","DOIUrl":null,"url":null,"abstract":"<div><div>During the COVID-19 pandemic, the analysis of patient data has become a cornerstone for developing effective public health strategies. This study leverages a dataset comprising over 10,000 anonymized patient records from various leading medical institutions to predict COVID-19 patient age groups using a suite of statistical and machine learning techniques. Initially, extensive statistical tests including ANOVA and t-tests were utilized to assess relationships among demographic and symptomatic variables. The study then employed machine learning models such as Decision Tree, Naïve Bayes, KNN, Gradient Boosted Trees, Support Vector Machine, and Random Forest, with rigorous data preprocessing to enhance model accuracy. Further improvements were sought through ensemble methods; bagging, boosting, and stacking. Our findings indicate strong associations between key symptoms and patient age groups, with ensemble methods significantly enhancing model accuracy. Specifically, stacking applied with random forest as a meta leaner exhibited the highest accuracy (0.7054). In addition, the implementation of stacking techniques notably improved the performance of K-Nearest Neighbors (from 0.529 to 0.63) and Naïve Bayes (from 0.554 to 0.622) and demonstrated the most successful prediction method. The study aimed to understand the number of symptoms identified in COVID-19 patients and their association with different age groups. The results can assist doctors and higher authorities in improving treatment strategies. Additionally, several decision-making techniques can be applied during pandemic, tailored to specific age groups, such as resource allocation, medicine availability, vaccine development, and treatment strategies. The integration of these predictive models into clinical settings could support real-time public health responses and targeted intervention strategies.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524012964","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

During the COVID-19 pandemic, the analysis of patient data has become a cornerstone for developing effective public health strategies. This study leverages a dataset comprising over 10,000 anonymized patient records from various leading medical institutions to predict COVID-19 patient age groups using a suite of statistical and machine learning techniques. Initially, extensive statistical tests including ANOVA and t-tests were utilized to assess relationships among demographic and symptomatic variables. The study then employed machine learning models such as Decision Tree, Naïve Bayes, KNN, Gradient Boosted Trees, Support Vector Machine, and Random Forest, with rigorous data preprocessing to enhance model accuracy. Further improvements were sought through ensemble methods; bagging, boosting, and stacking. Our findings indicate strong associations between key symptoms and patient age groups, with ensemble methods significantly enhancing model accuracy. Specifically, stacking applied with random forest as a meta leaner exhibited the highest accuracy (0.7054). In addition, the implementation of stacking techniques notably improved the performance of K-Nearest Neighbors (from 0.529 to 0.63) and Naïve Bayes (from 0.554 to 0.622) and demonstrated the most successful prediction method. The study aimed to understand the number of symptoms identified in COVID-19 patients and their association with different age groups. The results can assist doctors and higher authorities in improving treatment strategies. Additionally, several decision-making techniques can be applied during pandemic, tailored to specific age groups, such as resource allocation, medicine availability, vaccine development, and treatment strategies. The integration of these predictive models into clinical settings could support real-time public health responses and targeted intervention strategies.
COVID-19 从症状到预测:统计和机器学习方法。
在 COVID-19 大流行期间,患者数据分析已成为制定有效公共卫生策略的基石。本研究利用了一个数据集,该数据集由来自各主要医疗机构的 10,000 多份匿名患者记录组成,利用一套统计和机器学习技术来预测 COVID-19 患者的年龄组。首先,利用方差分析和 t 检验等广泛的统计检验来评估人口统计学变量和症状变量之间的关系。然后,研究采用了机器学习模型,如决策树、奈夫贝叶斯、KNN、梯度提升树、支持向量机和随机森林,并进行了严格的数据预处理,以提高模型的准确性。此外,我们还采用了组合方法,即套袋法、提升法和堆叠法,以进一步提高模型的准确性。我们的研究结果表明,主要症状与患者年龄组之间的关联性很强,集合方法显著提高了模型的准确性。具体来说,以随机森林作为元精益器的堆叠法表现出最高的准确率(0.7054)。此外,堆叠技术的应用还显著提高了 K-近邻(从 0.529 到 0.63)和奈夫贝叶斯(从 0.554 到 0.622)的性能,是最成功的预测方法。该研究旨在了解 COVID-19 患者被识别出的症状数量及其与不同年龄组的关联。研究结果可帮助医生和上级部门改进治疗策略。此外,在大流行期间,还可针对特定年龄组应用一些决策技术,如资源分配、药品供应、疫苗开发和治疗策略等。将这些预测模型整合到临床环境中可支持实时公共卫生响应和有针对性的干预策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信