{"title":"Leveraging Machine Learning and Model-Agnostic Explanations to Understand Automated Diagnosis of Cardiovascular Disease","authors":"Christopher Sun, J. Sharma, Milind Maiti","doi":"10.1109/IBIOMED56408.2022.9988121","DOIUrl":null,"url":null,"abstract":"The pervasiveness of cardiovascular disease and physician misdiagnosis creates the need for artificial intelligence models to improve diagnosis accuracy. The study trains machine learning models on publicly available data sets containing simple medical information of patients to diagnose cardiovascular disease. The Multilayer Perceptron (MLP) assembled for this task performed optimally with an F1 score of 0.8968. This prompts the creation of an automated open-source diagnosis tool powered by the MLP. Local Interpretable Model-Agnostic Explanations (LIME) are employed to understand the impact of different features on the model's diagnosis in the form of marginal probabilities. K-Means Clustering segments patients into ten clusters, after which each example is passed through LIME. The resulting histograms depict a complex relationship between feature, cluster, and impact on diagnosis. A series of P-values with contrasting orders of magnitude shows nuances in the MLP's understanding of patients from different clusters. LIME analysis reveals that the most important features for cardiovascular disease diagnosis are fasting blood sugar, type of chest pain, and ST segment slope. Future experiments should replicate this study's LIME methodology on data sets containing more specialized features in order to gain practical medical insights about the different types of cardiovascular disease represented by each cluster. Finally, feature engineering pathways should be explored with consideration of these results to create versatile diagnosis models adaptable to other diseases as well.","PeriodicalId":250112,"journal":{"name":"2022 4th International Conference on Biomedical Engineering (IBIOMED)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Biomedical Engineering (IBIOMED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IBIOMED56408.2022.9988121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The pervasiveness of cardiovascular disease and physician misdiagnosis creates the need for artificial intelligence models to improve diagnosis accuracy. The study trains machine learning models on publicly available data sets containing simple medical information of patients to diagnose cardiovascular disease. The Multilayer Perceptron (MLP) assembled for this task performed optimally with an F1 score of 0.8968. This prompts the creation of an automated open-source diagnosis tool powered by the MLP. Local Interpretable Model-Agnostic Explanations (LIME) are employed to understand the impact of different features on the model's diagnosis in the form of marginal probabilities. K-Means Clustering segments patients into ten clusters, after which each example is passed through LIME. The resulting histograms depict a complex relationship between feature, cluster, and impact on diagnosis. A series of P-values with contrasting orders of magnitude shows nuances in the MLP's understanding of patients from different clusters. LIME analysis reveals that the most important features for cardiovascular disease diagnosis are fasting blood sugar, type of chest pain, and ST segment slope. Future experiments should replicate this study's LIME methodology on data sets containing more specialized features in order to gain practical medical insights about the different types of cardiovascular disease represented by each cluster. Finally, feature engineering pathways should be explored with consideration of these results to create versatile diagnosis models adaptable to other diseases as well.