基于SHAP值的聚类分析在动静脉瘘血液透析患者中的应用

IF 2 4区 医学 Q2 MEDICINE, GENERAL & INTERNAL
International Journal of General Medicine Pub Date : 2025-09-13 eCollection Date: 2025-01-01 DOI:10.2147/IJGM.S533419
Peng Shu, Ling Huang, Xia Wang, Zhuping Wen, Yiqi Luo, Fang Xu
{"title":"基于SHAP值的聚类分析在动静脉瘘血液透析患者中的应用","authors":"Peng Shu, Ling Huang, Xia Wang, Zhuping Wen, Yiqi Luo, Fang Xu","doi":"10.2147/IJGM.S533419","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.</p><p><strong>Methods: </strong>A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.</p><p><strong>Results: </strong>The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.</p><p><strong>Conclusion: </strong>Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.</p>","PeriodicalId":14131,"journal":{"name":"International Journal of General Medicine","volume":"18 ","pages":"5475-5489"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12442918/pdf/","citationCount":"0","resultStr":"{\"title\":\"Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.\",\"authors\":\"Peng Shu, Ling Huang, Xia Wang, Zhuping Wen, Yiqi Luo, Fang Xu\",\"doi\":\"10.2147/IJGM.S533419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.</p><p><strong>Methods: </strong>A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.</p><p><strong>Results: </strong>The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.</p><p><strong>Conclusion: </strong>Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.</p>\",\"PeriodicalId\":14131,\"journal\":{\"name\":\"International Journal of General Medicine\",\"volume\":\"18 \",\"pages\":\"5475-5489\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12442918/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of General Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2147/IJGM.S533419\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of General Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/IJGM.S533419","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

背景:血液透析患者使用动静脉瘘的预后具有明显的异质性,并受多种因素的影响,包括血管状况和基础疾病。本研究旨在通过基于SHAP值的聚类分析,揭示患者亚组特征,识别关键影响因素。方法:对974例动静脉瘘血液透析患者进行队列分析,提取55项临床特征进行分析。通过主成分分析进行多次归算、标准化和降维后,使用轮廓系数和Calinski-Harabasz指数等指标评估K-Means、DBSCAN和分层聚类算法的有效性。选择K = 3的K- means算法来生成伪目标变量。随后将其与XGBoost模型集成,并使用SHAP值分析来阐明特征贡献。结果:K-Means聚类算法表现出优异的性能,剪影系数为0.05,有效地将患者分为三个不同的聚类。集群1的特征是血红蛋白浓度范围从-2到5,中位数为1,集群之间的变异性最高。集群2显示血红蛋白浓度主要在-3和2之间,中位数为0。集群3显示血红蛋白浓度分布类似于集群2,尽管在尾部略有较大的变化。SHAP分析发现血红蛋白浓度是最显著的特征,其SHAP值为550,表明其分布的变化是聚类过程的主要驱动因素。此外,年龄、BMI、总胆固醇和其他特征通过复杂的非线性相互作用影响聚类结果。结论:用SHAP值进行聚类分析初步确定了这类患者的异质性亚群,血红蛋白浓度可能是关键驱动因素。这种方法可能有助于个性化治疗,但广泛性需要多中心验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.

Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.

Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.

Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.

Background: The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.

Methods: A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.

Results: The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.

Conclusion: Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of General Medicine
International Journal of General Medicine Medicine-General Medicine
自引率
0.00%
发文量
1113
审稿时长
16 weeks
期刊介绍: The International Journal of General Medicine is an international, peer-reviewed, open access journal that focuses on general and internal medicine, pathogenesis, epidemiology, diagnosis, monitoring and treatment protocols. The journal is characterized by the rapid reporting of reviews, original research and clinical studies across all disease areas. A key focus of the journal is the elucidation of disease processes and management protocols resulting in improved outcomes for the patient. Patient perspectives such as satisfaction, quality of life, health literacy and communication and their role in developing new healthcare programs and optimizing clinical outcomes are major areas of interest for the journal. As of 1st April 2019, the International Journal of General Medicine will no longer consider meta-analyses for publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信