{"title":"基于SHAP值的聚类分析在动静脉瘘血液透析患者中的应用","authors":"Peng Shu, Ling Huang, Xia Wang, Zhuping Wen, Yiqi Luo, Fang Xu","doi":"10.2147/IJGM.S533419","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.</p><p><strong>Methods: </strong>A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.</p><p><strong>Results: </strong>The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.</p><p><strong>Conclusion: </strong>Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.</p>","PeriodicalId":14131,"journal":{"name":"International Journal of General Medicine","volume":"18 ","pages":"5475-5489"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12442918/pdf/","citationCount":"0","resultStr":"{\"title\":\"Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.\",\"authors\":\"Peng Shu, Ling Huang, Xia Wang, Zhuping Wen, Yiqi Luo, Fang Xu\",\"doi\":\"10.2147/IJGM.S533419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.</p><p><strong>Methods: </strong>A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.</p><p><strong>Results: </strong>The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.</p><p><strong>Conclusion: </strong>Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.</p>\",\"PeriodicalId\":14131,\"journal\":{\"name\":\"International Journal of General Medicine\",\"volume\":\"18 \",\"pages\":\"5475-5489\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12442918/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of General Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2147/IJGM.S533419\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of General Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/IJGM.S533419","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
Application of Cluster Analysis Based on SHAP Values in Hemodialysis Patients Using Arteriovenous Fistula.
Background: The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.
Methods: A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.
Results: The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.
Conclusion: Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.
期刊介绍:
The International Journal of General Medicine is an international, peer-reviewed, open access journal that focuses on general and internal medicine, pathogenesis, epidemiology, diagnosis, monitoring and treatment protocols. The journal is characterized by the rapid reporting of reviews, original research and clinical studies across all disease areas.
A key focus of the journal is the elucidation of disease processes and management protocols resulting in improved outcomes for the patient. Patient perspectives such as satisfaction, quality of life, health literacy and communication and their role in developing new healthcare programs and optimizing clinical outcomes are major areas of interest for the journal.
As of 1st April 2019, the International Journal of General Medicine will no longer consider meta-analyses for publication.