从COVID-19早期数据确定预后因素的黑箱机器学习模型的全球和本地解释

Symposium on Medical Information Processing and Analysis Pub Date : 2021-09-10 DOI:10.1117/12.2604743

Ananya Jana, Carlos D Minacapelli, V. Rustgi, Dimitris N. Metaxas

{"title":"从COVID-19早期数据确定预后因素的黑箱机器学习模型的全球和本地解释","authors":"Ananya Jana, Carlos D Minacapelli, V. Rustgi, Dimitris N. Metaxas","doi":"10.1117/12.2604743","DOIUrl":null,"url":null,"abstract":"The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some recent progress made on the interpretability of machine learning models has the potential to unravel more insights while using conventional machine learning models.1–3 In this work, we analyze COVID-19 blood work data with some of the popular machine learning models; then we employ state-of-the-art post-hoc local interpretability techniques(e.g.- SHAP, LIME), and global interpretability techniques(e.g. - symbolic metamodeling) to the trained black-box models to draw interpretable conclusions. In the gamut of machine learning algorithms, regressions remain one of the simplest and most explainable models with clear mathematical formulation. We explore one of the most recent techniques called symbolic metamodeling to find the mathematical expression of the machine learning models for COVID-19. We identify Acute Kidney Injury (AKI), initial Albumin level (ALB I), Aspartate aminotransferase (AST I), Total Bilirubin initial (TBILI) and D-Dimer initial (DIMER) as major prognostic factors of the disease severity. Our contributions are - (i) uncover the underlying mathematical expression for the black-box models on COVID-19 severity prediction task (ii) we are the first to apply symbolic metamodeling to this task, and (iii) discover important features and feature interactions. Code repository: https://github.com/ananyajana/interpretable covid19.","PeriodicalId":147201,"journal":{"name":"Symposium on Medical Information Processing and Analysis","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Global and local interpretation of black-box machine learning models to determine prognostic factors from early COVID-19 data\",\"authors\":\"Ananya Jana, Carlos D Minacapelli, V. Rustgi, Dimitris N. Metaxas\",\"doi\":\"10.1117/12.2604743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some recent progress made on the interpretability of machine learning models has the potential to unravel more insights while using conventional machine learning models.1–3 In this work, we analyze COVID-19 blood work data with some of the popular machine learning models; then we employ state-of-the-art post-hoc local interpretability techniques(e.g.- SHAP, LIME), and global interpretability techniques(e.g. - symbolic metamodeling) to the trained black-box models to draw interpretable conclusions. In the gamut of machine learning algorithms, regressions remain one of the simplest and most explainable models with clear mathematical formulation. We explore one of the most recent techniques called symbolic metamodeling to find the mathematical expression of the machine learning models for COVID-19. We identify Acute Kidney Injury (AKI), initial Albumin level (ALB I), Aspartate aminotransferase (AST I), Total Bilirubin initial (TBILI) and D-Dimer initial (DIMER) as major prognostic factors of the disease severity. Our contributions are - (i) uncover the underlying mathematical expression for the black-box models on COVID-19 severity prediction task (ii) we are the first to apply symbolic metamodeling to this task, and (iii) discover important features and feature interactions. Code repository: https://github.com/ananyajana/interpretable covid19.\",\"PeriodicalId\":147201,\"journal\":{\"name\":\"Symposium on Medical Information Processing and Analysis\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symposium on Medical Information Processing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2604743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Medical Information Processing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2604743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

截至2021年7月24日，新冠病毒已夺去410万人的生命。各种机器学习模型被应用于相关数据，预测疾病的严重程度、感染率等重要因素，发现重要的预后因素。通常，由于缺乏方法可解释性，使用这些技术的发现的有用性会降低。最近在机器学习模型的可解释性方面取得的一些进展有可能在使用传统机器学习模型的同时揭示更多的见解。在这项工作中，我们使用一些流行的机器学习模型分析了COVID-19血液工作数据;然后，我们采用最先进的即时局部可解释性技术(例如:- SHAP, LIME)和全局可解释性技术(例如:-符号元建模)，以训练黑箱模型得出可解释的结论。在机器学习算法的范围内，回归仍然是最简单和最可解释的模型之一，具有明确的数学公式。我们探索了一种称为符号元建模的最新技术，以找到COVID-19机器学习模型的数学表达式。我们确定急性肾损伤(AKI)、初始白蛋白水平(ALB I)、天冬氨酸转氨酶(AST I)、总胆红素初始值(TBILI)和d -二聚体初始值(DIMER)是疾病严重程度的主要预后因素。我们的贡献是——(i)揭示了COVID-19严重性预测任务中黑箱模型的底层数学表达式(ii)我们率先将符号元建模应用于该任务，以及(iii)发现重要特征和特征交互。代码存储库:https://github.com/ananyajana/interpretable covid - 19。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Global and local interpretation of black-box machine learning models to determine prognostic factors from early COVID-19 data

The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some recent progress made on the interpretability of machine learning models has the potential to unravel more insights while using conventional machine learning models.1–3 In this work, we analyze COVID-19 blood work data with some of the popular machine learning models; then we employ state-of-the-art post-hoc local interpretability techniques(e.g.- SHAP, LIME), and global interpretability techniques(e.g. - symbolic metamodeling) to the trained black-box models to draw interpretable conclusions. In the gamut of machine learning algorithms, regressions remain one of the simplest and most explainable models with clear mathematical formulation. We explore one of the most recent techniques called symbolic metamodeling to find the mathematical expression of the machine learning models for COVID-19. We identify Acute Kidney Injury (AKI), initial Albumin level (ALB I), Aspartate aminotransferase (AST I), Total Bilirubin initial (TBILI) and D-Dimer initial (DIMER) as major prognostic factors of the disease severity. Our contributions are - (i) uncover the underlying mathematical expression for the black-box models on COVID-19 severity prediction task (ii) we are the first to apply symbolic metamodeling to this task, and (iii) discover important features and feature interactions. Code repository: https://github.com/ananyajana/interpretable covid19.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Symposium on Medical Information Processing and Analysis

自引率

0.00%

发文量