A tree-based explainable AI model for early detection of Covid-19 using physiological data.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Manar Abu Talib, Yaman Afadar, Qassim Nasir, Ali Bou Nassif, Haytham Hijazi, Ahmad Hasasneh
{"title":"A tree-based explainable AI model for early detection of Covid-19 using physiological data.","authors":"Manar Abu Talib, Yaman Afadar, Qassim Nasir, Ali Bou Nassif, Haytham Hijazi, Ahmad Hasasneh","doi":"10.1186/s12911-024-02576-2","DOIUrl":null,"url":null,"abstract":"<p><p>With the outbreak of COVID-19 in 2020, countries worldwide faced significant concerns and challenges. Various studies have emerged utilizing Artificial Intelligence (AI) and Data Science techniques for disease detection. Although COVID-19 cases have declined, there are still cases and deaths around the world. Therefore, early detection of COVID-19 before the onset of symptoms has become crucial in reducing its extensive impact. Fortunately, wearable devices such as smartwatches have proven to be valuable sources of physiological data, including Heart Rate (HR) and sleep quality, enabling the detection of inflammatory diseases. In this study, we utilize an already-existing dataset that includes individual step counts and heart rate data to predict the probability of COVID-19 infection before the onset of symptoms. We train three main model architectures: the Gradient Boosting classifier (GB), CatBoost trees, and TabNet classifier to analyze the physiological data and compare their respective performances. We also add an interpretability layer to our best-performing model, which clarifies prediction results and allows a detailed assessment of effectiveness. Moreover, we created a private dataset by gathering physiological data from Fitbit devices to guarantee reliability and avoid bias.The identical set of models was then applied to this private dataset using the same pre-trained models, and the results were documented. Using the CatBoost tree-based method, our best-performing model outperformed previous studies with an accuracy rate of 85% on the publicly available dataset. Furthermore, this identical pre-trained CatBoost model produced an accuracy of 81% when applied to the private dataset. You will find the source code in the link: https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git .</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11194929/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02576-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

With the outbreak of COVID-19 in 2020, countries worldwide faced significant concerns and challenges. Various studies have emerged utilizing Artificial Intelligence (AI) and Data Science techniques for disease detection. Although COVID-19 cases have declined, there are still cases and deaths around the world. Therefore, early detection of COVID-19 before the onset of symptoms has become crucial in reducing its extensive impact. Fortunately, wearable devices such as smartwatches have proven to be valuable sources of physiological data, including Heart Rate (HR) and sleep quality, enabling the detection of inflammatory diseases. In this study, we utilize an already-existing dataset that includes individual step counts and heart rate data to predict the probability of COVID-19 infection before the onset of symptoms. We train three main model architectures: the Gradient Boosting classifier (GB), CatBoost trees, and TabNet classifier to analyze the physiological data and compare their respective performances. We also add an interpretability layer to our best-performing model, which clarifies prediction results and allows a detailed assessment of effectiveness. Moreover, we created a private dataset by gathering physiological data from Fitbit devices to guarantee reliability and avoid bias.The identical set of models was then applied to this private dataset using the same pre-trained models, and the results were documented. Using the CatBoost tree-based method, our best-performing model outperformed previous studies with an accuracy rate of 85% on the publicly available dataset. Furthermore, this identical pre-trained CatBoost model produced an accuracy of 81% when applied to the private dataset. You will find the source code in the link: https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git .

利用生理数据早期检测 Covid-19 的基于树的可解释人工智能模型。
随着 COVID-19 在 2020 年的爆发,世界各国都面临着巨大的担忧和挑战。利用人工智能(AI)和数据科学技术进行疾病检测的研究层出不穷。虽然 COVID-19 病例有所减少,但世界各地仍有病例和死亡病例。因此,在症状出现之前及早发现 COVID-19 已成为减少其广泛影响的关键。幸运的是,智能手表等可穿戴设备已被证明是宝贵的生理数据来源,包括心率(HR)和睡眠质量,从而能够检测炎症性疾病。在本研究中,我们利用包含个人步数和心率数据的现有数据集来预测症状出现前感染 COVID-19 的概率。我们训练了三种主要的模型架构:梯度提升分类器(GB)、CatBoost 树和 TabNet 分类器,以分析生理数据并比较它们各自的性能。我们还为表现最佳的模型添加了可解释性层,以明确预测结果,并对有效性进行详细评估。此外,我们还通过收集 Fitbit 设备的生理数据创建了一个私有数据集,以保证可靠性并避免偏见。然后,我们使用相同的预训练模型将相同的模型集应用于该私有数据集,并将结果记录在案。使用基于 CatBoost 树的方法,我们的最佳模型在公开数据集上的准确率达到了 85%,超过了之前的研究。此外,在应用于私有数据集时,这个相同的预训练 CatBoost 模型的准确率也达到了 81%。您可以在以下链接中找到源代码:https://github.com/OpenUAE-LAB/Covid-19-detection-using-Wearable-data.git 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信