Leila Yousefi, S. Swift, Mahir Arzoky, L. Sacchi, L. Chiovato, A. Tucker
{"title":"Opening the Black Box: Discovering and Explaining Hidden Variables in Type 2 Diabetic Patient Modelling","authors":"Leila Yousefi, S. Swift, Mahir Arzoky, L. Sacchi, L. Chiovato, A. Tucker","doi":"10.1109/BIBM.2018.8621484","DOIUrl":null,"url":null,"abstract":"Clinicians predict disease and related complications based on prior knowledge and each individual patient's clinical history. The prediction process is complex due to the existence of unmeasured risk factors, the unexpected development of complications and varying responses of patients to disease over time. Exploiting these unmeasured risk factors (hidden variables) can improve the modeling of disease progression and thus enables clinicians to focus on early diagnosis and treatment of unexpected conditions. However, the overuse of hidden variables can lead to complex models that can overfit and are not well understood (being 'black box' in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand disease progression in different patients while improving prediction. We explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping and hierarchical clustering to cluster patients based upon these hidden variables to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Clinicians predict disease and related complications based on prior knowledge and each individual patient's clinical history. The prediction process is complex due to the existence of unmeasured risk factors, the unexpected development of complications and varying responses of patients to disease over time. Exploiting these unmeasured risk factors (hidden variables) can improve the modeling of disease progression and thus enables clinicians to focus on early diagnosis and treatment of unexpected conditions. However, the overuse of hidden variables can lead to complex models that can overfit and are not well understood (being 'black box' in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand disease progression in different patients while improving prediction. We explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping and hierarchical clustering to cluster patients based upon these hidden variables to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.