通过机器学习和XAI揭示COVID-19动态:调查变异影响和预后分类

IF 6 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine learning and knowledge extraction Pub Date : 2023-09-25 DOI:10.3390/make5040064

Oliver Lohaj, Ján Paralič, Peter Bednár, Zuzana Paraličová, Matúš Huba

{"title":"通过机器学习和XAI揭示COVID-19动态:调查变异影响和预后分类","authors":"Oliver Lohaj, Ján Paralič, Peter Bednár, Zuzana Paraličová, Matúš Huba","doi":"10.3390/make5040064","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) has been used in different ways in the fight against COVID-19 disease. ML models have been developed, e.g., for diagnostic or prognostic purposes and using various modalities of data (e.g., textual, visual, or structured). Due to the many specific aspects of this disease and its evolution over time, there is still not enough understanding of all relevant factors influencing the course of COVID-19 in particular patients. In all aspects of our work, there was a strong involvement of a medical expert following the human-in-the-loop principle. This is a very important but usually neglected part of the ML and knowledge extraction (KE) process. Our research shows that explainable artificial intelligence (XAI) may significantly support this part of ML and KE. Our research focused on using ML for knowledge extraction in two specific scenarios. In the first scenario, we aimed to discover whether adding information about the predominant COVID-19 variant impacts the performance of the ML models. In the second scenario, we focused on prognostic classification models concerning the need for an intensive care unit for a given patient in connection with different explainability AI (XAI) methods. We have used nine ML algorithms, namely XGBoost, CatBoost, LightGBM, logistic regression, Naive Bayes, random forest, SGD, SVM-linear, and SVM-RBF. We measured the performance of the resulting models using precision, accuracy, and AUC metrics. Subsequently, we focused on knowledge extraction from the best-performing models using two different approaches as follows: (a) features extracted automatically by forward stepwise selection (FSS); (b) attributes and their interactions discovered by model explainability methods. Both were compared with the attributes selected by the medical experts in advance based on the domain expertise. Our experiments showed that adding information about the COVID-19 variant did not influence the performance of the resulting ML models. It also turned out that medical experts were much more precise in the identification of significant attributes than FSS. Explainability methods identified almost the same attributes as a medical expert and interesting interactions among them, which the expert discussed from a medical point of view. The results of our research and their consequences are discussed.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"23 1","pages":"0"},"PeriodicalIF":6.0000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unraveling COVID-19 Dynamics via Machine Learning and XAI: Investigating Variant Influence and Prognostic Classification\",\"authors\":\"Oliver Lohaj, Ján Paralič, Peter Bednár, Zuzana Paraličová, Matúš Huba\",\"doi\":\"10.3390/make5040064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) has been used in different ways in the fight against COVID-19 disease. ML models have been developed, e.g., for diagnostic or prognostic purposes and using various modalities of data (e.g., textual, visual, or structured). Due to the many specific aspects of this disease and its evolution over time, there is still not enough understanding of all relevant factors influencing the course of COVID-19 in particular patients. In all aspects of our work, there was a strong involvement of a medical expert following the human-in-the-loop principle. This is a very important but usually neglected part of the ML and knowledge extraction (KE) process. Our research shows that explainable artificial intelligence (XAI) may significantly support this part of ML and KE. Our research focused on using ML for knowledge extraction in two specific scenarios. In the first scenario, we aimed to discover whether adding information about the predominant COVID-19 variant impacts the performance of the ML models. In the second scenario, we focused on prognostic classification models concerning the need for an intensive care unit for a given patient in connection with different explainability AI (XAI) methods. We have used nine ML algorithms, namely XGBoost, CatBoost, LightGBM, logistic regression, Naive Bayes, random forest, SGD, SVM-linear, and SVM-RBF. We measured the performance of the resulting models using precision, accuracy, and AUC metrics. Subsequently, we focused on knowledge extraction from the best-performing models using two different approaches as follows: (a) features extracted automatically by forward stepwise selection (FSS); (b) attributes and their interactions discovered by model explainability methods. Both were compared with the attributes selected by the medical experts in advance based on the domain expertise. Our experiments showed that adding information about the COVID-19 variant did not influence the performance of the resulting ML models. It also turned out that medical experts were much more precise in the identification of significant attributes than FSS. Explainability methods identified almost the same attributes as a medical expert and interesting interactions among them, which the expert discussed from a medical point of view. The results of our research and their consequences are discussed.\",\"PeriodicalId\":93033,\"journal\":{\"name\":\"Machine learning and knowledge extraction\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2023-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning and knowledge extraction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/make5040064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge extraction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/make5040064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

机器学习(ML)已经以不同的方式用于对抗COVID-19疾病。ML模型已经被开发出来，例如，用于诊断或预后目的，并使用各种数据模式(例如，文本、视觉或结构化)。由于这种疾病的许多具体方面及其随时间的演变，对影响特定患者COVID-19病程的所有相关因素仍然没有足够的了解。在我们工作的各个方面，都有一位医学专家按照“人在循环”的原则大力参与。这是ML和知识提取(KE)过程中非常重要但通常被忽视的部分。我们的研究表明，可解释的人工智能(XAI)可能会显著支持ML和KE的这一部分。我们的研究重点是在两个特定的场景中使用ML进行知识提取。在第一个场景中，我们的目标是发现添加有关主要COVID-19变体的信息是否会影响ML模型的性能。在第二种情况下，我们将重点放在与不同可解释性AI (XAI)方法相关的特定患者需要重症监护病房的预后分类模型上。我们使用了9种ML算法，即XGBoost、CatBoost、LightGBM、逻辑回归、朴素贝叶斯、随机森林、SGD、SVM-linear和SVM-RBF。我们使用精度、准确度和AUC度量度量结果模型的性能。随后，我们重点研究了使用两种不同的方法从表现最好的模型中提取知识:(a)采用前向逐步选择(FSS)自动提取特征;(b)模型可解释性方法发现的属性及其相互作用。将两者与医学专家根据领域专长预先选择的属性进行比较。我们的实验表明，添加关于COVID-19变体的信息并不影响最终ML模型的性能。另外，医学专家对重要属性的判断也比金融监督院准确得多。可解释性方法确定了与医学专家几乎相同的属性和它们之间有趣的相互作用，专家从医学的角度对其进行了讨论。讨论了我们的研究结果及其后果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unraveling COVID-19 Dynamics via Machine Learning and XAI: Investigating Variant Influence and Prognostic Classification

Machine learning (ML) has been used in different ways in the fight against COVID-19 disease. ML models have been developed, e.g., for diagnostic or prognostic purposes and using various modalities of data (e.g., textual, visual, or structured). Due to the many specific aspects of this disease and its evolution over time, there is still not enough understanding of all relevant factors influencing the course of COVID-19 in particular patients. In all aspects of our work, there was a strong involvement of a medical expert following the human-in-the-loop principle. This is a very important but usually neglected part of the ML and knowledge extraction (KE) process. Our research shows that explainable artificial intelligence (XAI) may significantly support this part of ML and KE. Our research focused on using ML for knowledge extraction in two specific scenarios. In the first scenario, we aimed to discover whether adding information about the predominant COVID-19 variant impacts the performance of the ML models. In the second scenario, we focused on prognostic classification models concerning the need for an intensive care unit for a given patient in connection with different explainability AI (XAI) methods. We have used nine ML algorithms, namely XGBoost, CatBoost, LightGBM, logistic regression, Naive Bayes, random forest, SGD, SVM-linear, and SVM-RBF. We measured the performance of the resulting models using precision, accuracy, and AUC metrics. Subsequently, we focused on knowledge extraction from the best-performing models using two different approaches as follows: (a) features extracted automatically by forward stepwise selection (FSS); (b) attributes and their interactions discovered by model explainability methods. Both were compared with the attributes selected by the medical experts in advance based on the domain expertise. Our experiments showed that adding information about the COVID-19 variant did not influence the performance of the resulting ML models. It also turned out that medical experts were much more precise in the identification of significant attributes than FSS. Explainability methods identified almost the same attributes as a medical expert and interesting interactions among them, which the expert discussed from a medical point of view. The results of our research and their consequences are discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning and knowledge extraction

CiteScore

6.30

自引率

0.00%

发文量

审稿时长

7 weeks