Deep Clustering of Electronic Health Records Tabular Data for Clinical Interpretation.

Ibna Kowsar, Shourav B Rabbani, Kazi Fuad B Akhter, Manar D Samad
{"title":"Deep Clustering of Electronic Health Records Tabular Data for Clinical Interpretation.","authors":"Ibna Kowsar, Shourav B Rabbani, Kazi Fuad B Akhter, Manar D Samad","doi":"10.1109/ictp60248.2023.10490723","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning applications are widespread due to straightforward supervised learning of known data labels. Many data samples in real-world scenarios, including medicine, are unlabeled because data annotation can be time-consuming and error-prone. The application and evaluation of unsupervised clustering methods are not trivial and are limited to traditional methods (e.g., k-means) when clinicians demand deeper insights into patient data beyond classification accuracy. The contribution of this paper is three-fold: 1) to introduce a patient stratification strategy based on a clinical variable instead of a diagnostic label, 2) to evaluate clustering performance using within-cluster homogeneity and between-cluster statistical difference, and 3) to compare widely used traditional clustering algorithms (e.g., k-means) with a state-of-the-art deep learning solution for clustering tabular data. The deep clustering method achieves superior within-cluster homogeneity and between-cluster separation compared to k-means and identifies three statistically distinct and clinically interpretable high blood pressure patient clusters. The proposed clustering strategy and evaluation metrics will facilitate the stratification of large patient cohorts in health science research without requiring explicit diagnostic labels.</p>","PeriodicalId":519985,"journal":{"name":"... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics","volume":"2023 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11255553/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... IEEE International Conference on Telecommunications and Photonics. IEEE International Conference on Telecommunications and Photonics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictp60248.2023.10490723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/11 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning applications are widespread due to straightforward supervised learning of known data labels. Many data samples in real-world scenarios, including medicine, are unlabeled because data annotation can be time-consuming and error-prone. The application and evaluation of unsupervised clustering methods are not trivial and are limited to traditional methods (e.g., k-means) when clinicians demand deeper insights into patient data beyond classification accuracy. The contribution of this paper is three-fold: 1) to introduce a patient stratification strategy based on a clinical variable instead of a diagnostic label, 2) to evaluate clustering performance using within-cluster homogeneity and between-cluster statistical difference, and 3) to compare widely used traditional clustering algorithms (e.g., k-means) with a state-of-the-art deep learning solution for clustering tabular data. The deep clustering method achieves superior within-cluster homogeneity and between-cluster separation compared to k-means and identifies three statistically distinct and clinically interpretable high blood pressure patient clusters. The proposed clustering strategy and evaluation metrics will facilitate the stratification of large patient cohorts in health science research without requiring explicit diagnostic labels.

电子健康记录表格数据的深度聚类,用于临床解读。
由于对已知数据标签的直接监督学习,机器学习的应用非常广泛。由于数据标注耗时且容易出错,现实世界(包括医学)中的许多数据样本都是无标签的。无监督聚类方法的应用和评估并非易事,当临床医生需要对病人数据有更深入的了解,而不仅仅局限于分类准确性时,无监督聚类方法的应用和评估就仅限于传统方法(如 K-均值)。本文有三方面的贡献:1)介绍一种基于临床变量而非诊断标签的患者分层策略;2)使用簇内同质性和簇间统计差异评估聚类性能;3)比较广泛使用的传统聚类算法(如 k-means)和最先进的表格数据聚类深度学习解决方案。与 k-means 相比,深度聚类方法实现了更优越的簇内同质性和簇间分离性,并识别出三个在统计学上截然不同且在临床上可解释的高血压患者簇。所提出的聚类策略和评估指标将有助于在健康科学研究中对大型患者群进行分层,而无需明确的诊断标签。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信