A K-Means Approach to Clustering Disease Progressions

2017 IEEE International Conference on Healthcare Informatics (ICHI) Pub Date : 2017-08-01 DOI:10.1109/ICHI.2017.18

D. Luong, V. Chandola

{"title":"A K-Means Approach to Clustering Disease Progressions","authors":"D. Luong, V. Chandola","doi":"10.1109/ICHI.2017.18","DOIUrl":null,"url":null,"abstract":"K-means algorithm has been a workhorse of unsupervised machine learning for many decades, primarily owing to its simplicity and efficiency. The algorithm requires availability of two key operations on the data, first, a distance metric to compare a pair of data objects, and second, a way to compute a representative (centroid) for a given set of data objects. These two requirements mean that k-means cannot be readily applied to time series data, in particular, to disease progression profiles often encountered in healthcare analysis. We present a k-means inspired approach to clustering disease progression data. The proposed method represents a cluster as a set of weights corresponding to a set of splines fitted to the time series data and uses the \"goodness-of-fit\" as a way to assign time series to clusters. We use the algorithm to group patients suffering from Chronic Kidney Disease (CKD) based on their disease progression profiles. A qualitative analysis of the representative profiles for the learnt clusters reveals that this simple approach can be used to identify groups of patients with interesting clinical characteristics. Additionally, we show how the representative profiles can be combined with patient's observations to obtain an accurate patient specific profile that can be used for extrapolating into the future.","PeriodicalId":263611,"journal":{"name":"2017 IEEE International Conference on Healthcare Informatics (ICHI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Healthcare Informatics (ICHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHI.2017.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

K-means algorithm has been a workhorse of unsupervised machine learning for many decades, primarily owing to its simplicity and efficiency. The algorithm requires availability of two key operations on the data, first, a distance metric to compare a pair of data objects, and second, a way to compute a representative (centroid) for a given set of data objects. These two requirements mean that k-means cannot be readily applied to time series data, in particular, to disease progression profiles often encountered in healthcare analysis. We present a k-means inspired approach to clustering disease progression data. The proposed method represents a cluster as a set of weights corresponding to a set of splines fitted to the time series data and uses the "goodness-of-fit" as a way to assign time series to clusters. We use the algorithm to group patients suffering from Chronic Kidney Disease (CKD) based on their disease progression profiles. A qualitative analysis of the representative profiles for the learnt clusters reveals that this simple approach can be used to identify groups of patients with interesting clinical characteristics. Additionally, we show how the representative profiles can be combined with patient's observations to obtain an accurate patient specific profile that can be used for extrapolating into the future.

查看原文本刊更多论文

聚类疾病进展的k -均值方法

几十年来，K-means算法一直是无监督机器学习的主力，主要是因为它的简单和高效。该算法需要对数据进行两个关键操作，首先是比较一对数据对象的距离度量，其次是为给定的一组数据对象计算代表(质心)的方法。这两个要求意味着k-means不能很容易地应用于时间序列数据，特别是在医疗保健分析中经常遇到的疾病进展概况。我们提出了一种k均值启发的方法来聚类疾病进展数据。该方法将聚类表示为与时间序列数据拟合的一组样条对应的一组权重，并使用“拟合优度”作为将时间序列分配给聚类的方法。我们使用该算法对慢性肾脏疾病(CKD)患者根据他们的疾病进展概况进行分组。对学习到的聚类的代表性特征进行定性分析表明，这种简单的方法可以用于识别具有有趣临床特征的患者组。此外，我们展示了如何将代表性的概况与患者的观察相结合，以获得准确的患者特定概况，可用于推断未来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Healthcare Informatics (ICHI)

自引率

0.00%

发文量