Interpretable Prediction of Diabetes from Tabular Health Screening Records Using an Attentional Neural Network

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI:10.1109/DSAA53316.2021.9564151

Yuki Oba, Taro Tezuka, Masaru Sanuki, Y. Wagatsuma

{"title":"Interpretable Prediction of Diabetes from Tabular Health Screening Records Using an Attentional Neural Network","authors":"Yuki Oba, Taro Tezuka, Masaru Sanuki, Y. Wagatsuma","doi":"10.1109/DSAA53316.2021.9564151","DOIUrl":null,"url":null,"abstract":"Health screening is conducted in numerous countries to observe general health conditions. Machine learning has been applied to health screening records to predict asymptomatic patients' future medical states. However, for medical researchers and physicians, it is crucial to know why machine learning methods made such predictions to understand the underlying mechanism of the disease and prescribe treatments; therefore, predictions must be interpretable. We investigated the ability of an attentional neural network that processes tabular data, namely TabNet, to determine attributes that contribute to making predictions of the aggravation of type 2 diabetes. We used both model-agnostic and model-specific interpretation methods. For the former, we tested SHapley Additive exPlanations (SHAP). For the latter, we used model-specific feature importance and the mask in the attentive transformer of TabNet. We found that this mask provides useful information regarding which items in a biochemical analysis affect the aggravation of type 2 diabetes. The results from model-agnostic and model-specific methods were consistent.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA53316.2021.9564151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Health screening is conducted in numerous countries to observe general health conditions. Machine learning has been applied to health screening records to predict asymptomatic patients' future medical states. However, for medical researchers and physicians, it is crucial to know why machine learning methods made such predictions to understand the underlying mechanism of the disease and prescribe treatments; therefore, predictions must be interpretable. We investigated the ability of an attentional neural network that processes tabular data, namely TabNet, to determine attributes that contribute to making predictions of the aggravation of type 2 diabetes. We used both model-agnostic and model-specific interpretation methods. For the former, we tested SHapley Additive exPlanations (SHAP). For the latter, we used model-specific feature importance and the mask in the attentive transformer of TabNet. We found that this mask provides useful information regarding which items in a biochemical analysis affect the aggravation of type 2 diabetes. The results from model-agnostic and model-specific methods were consistent.

查看原文本刊更多论文

利用注意神经网络从表格健康筛查记录中预测糖尿病的可解释性

许多国家都进行健康检查，以观察一般健康状况。机器学习已被应用于健康筛查记录，以预测无症状患者未来的医疗状态。然而，对于医学研究人员和医生来说，至关重要的是要知道为什么机器学习方法会做出这样的预测，以了解疾病的潜在机制并开出治疗处方;因此，预测必须是可解释的。我们研究了处理表格数据的注意神经网络(即TabNet)的能力，以确定有助于预测2型糖尿病恶化的属性。我们使用了模型不可知和模型特定的解释方法。对于前者，我们测试了SHapley加性解释(SHAP)。对于后者，我们在TabNet的注意转换器中使用了特定于模型的特征重要性和掩码。我们发现这个面罩提供了有用的信息，关于生化分析中哪些项目会影响2型糖尿病的恶化。模型不可知方法和模型特定方法的结果是一致的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)

自引率

0.00%

发文量