Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2018-12-01 DOI:10.1109/BIBM.2018.8621484

Leila Yousefi, S. Swift, Mahir Arzoky, L. Sacchi, L. Chiovato, A. Tucker

{"title":"Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling","authors":"Leila Yousefi, S. Swift, Mahir Arzoky, L. Sacchi, L. Chiovato, A. Tucker","doi":"10.1109/BIBM.2018.8621484","DOIUrl":null,"url":null,"abstract":"Clinicians predict disease and related complications based on prior knowledge and each individual patient's clinical history. The prediction process is complex due to the existence of unmeasured risk factors, the unexpected development of complications and varying responses of patients to disease over time. Exploiting these unmeasured risk factors (hidden variables) can improve the modeling of disease progression and thus enables clinicians to focus on early diagnosis and treatment of unexpected conditions. However, the overuse of hidden variables can lead to complex models that can overfit and are not well understood (being 'black box' in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand disease progression in different patients while improving prediction. We explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping and hierarchical clustering to cluster patients based upon these hidden variables to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Clinicians predict disease and related complications based on prior knowledge and each individual patient's clinical history. The prediction process is complex due to the existence of unmeasured risk factors, the unexpected development of complications and varying responses of patients to disease over time. Exploiting these unmeasured risk factors (hidden variables) can improve the modeling of disease progression and thus enables clinicians to focus on early diagnosis and treatment of unexpected conditions. However, the overuse of hidden variables can lead to complex models that can overfit and are not well understood (being 'black box' in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand disease progression in different patients while improving prediction. We explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping and hierarchical clustering to cluster patients based upon these hidden variables to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.

查看原文本刊更多论文

打开黑匣子:发现和解释2型糖尿病患者模型中的隐藏变量

临床医生根据先前的知识和每个患者的临床病史预测疾病和相关并发症。由于存在无法测量的危险因素，并发症的意外发展以及患者对疾病的不同反应，预测过程很复杂。利用这些未测量的风险因素(隐藏变量)可以改进疾病进展的建模，从而使临床医生能够专注于意外情况的早期诊断和治疗。然而，过度使用隐藏变量可能会导致复杂的模型，这些模型可能会过度拟合并且不能很好地理解(本质上是“黑箱”)。识别和了解具有相似疾病概况的患者群体(基于发现的隐藏变量)，可以更好地了解不同患者的疾病进展，同时改进预测。我们探索了一种基于归纳因果(IC*)算法的逐步方法用于增量识别隐藏变量。我们利用动态时间扭曲和分层聚类对这些隐藏变量进行聚类，以揭示它们在2型糖尿病患者并发症方面的意义。我们的研究结果表明，推断少量的目标隐藏变量并使用它们对患者进行聚类不仅可以提高预测精度，而且有助于解释不同发现的子组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量