在公共卫生中使用模糊c均值聚类和PCA:一种对抗心血管疾病和肥胖的机器学习方法

Q1 Medicine
Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili
{"title":"在公共卫生中使用模糊c均值聚类和PCA:一种对抗心血管疾病和肥胖的机器学习方法","authors":"Gamal Saad Mohamed Khamis ,&nbsp;Nasser S. Alqahtani ,&nbsp;Sultan Munadi Alanazi ,&nbsp;Mohammed Muharrab Alruwaili ,&nbsp;Mariam Shabram Alenazi ,&nbsp;Maneaf A. Alrawaili","doi":"10.1016/j.imu.2025.101666","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.</div><div>The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m<sup>2</sup>), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m<sup>2</sup>), and HOMA-IR (6.05), suggesting a latent risk group.</div><div>PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.</div><div>The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.</div><div>Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101666"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity\",\"authors\":\"Gamal Saad Mohamed Khamis ,&nbsp;Nasser S. Alqahtani ,&nbsp;Sultan Munadi Alanazi ,&nbsp;Mohammed Muharrab Alruwaili ,&nbsp;Mariam Shabram Alenazi ,&nbsp;Maneaf A. Alrawaili\",\"doi\":\"10.1016/j.imu.2025.101666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.</div><div>The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m<sup>2</sup>), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m<sup>2</sup>), and HOMA-IR (6.05), suggesting a latent risk group.</div><div>PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.</div><div>The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.</div><div>Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.</div></div>\",\"PeriodicalId\":13953,\"journal\":{\"name\":\"Informatics in Medicine Unlocked\",\"volume\":\"57 \",\"pages\":\"Article 101666\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Informatics in Medicine Unlocked\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352914825000541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

本研究引入了一个新的框架,将主成分分析(PCA)与模糊c均值(FCM)聚类相结合,以增强对高维健康数据的分析,特别是针对心血管疾病(CVD)和肥胖的危险人群的识别。这种在公共卫生领域从未探索过的独特方法有望为这些紧迫的卫生问题提供新的见解和解决办法。提出的PCA-FCM模型应用于一个包含20多个健康变量的数据集,这些健康变量来自18-75岁的人口样本。分析确定了四个不同的集群,每个集群都显示出独特的风险模式。例如,第一组(平均年龄29岁)表现出高体重指数(BMI)(平均33.7 kg/m2)、高腰围(113 cm)和胰岛素抵抗的迹象(FBS, 133 mg/dL;HOMA-IR, 7.12)。相比之下,第二组患者(平均年龄61岁)收缩压最高(收缩压143 mmHg),低密度脂蛋白胆固醇升高(4.27 mmol/L),甘油三酯升高(2.59 mmol/L),表明晚期代谢综合征。第三组(平均年龄51岁)代谢状况更健康,HOMA-IR较低(3.74),收缩压正常(127 mmHg),脂质水平平衡(HDL, 1.36 mmol/L)。第四组(平均年龄43岁)的收缩压升高(134 mmHg), BMI升高(32.1 kg/m2), HOMA-IR升高(6.05),提示为潜在危险组。PCA发现,腰围、内脏脂肪、LDL/HDL比、非HDL胆固醇和腰高比是影响聚类分离的最重要变量,前两个主成分的负荷均在0.70以上。同时,运动、身高、家族史和HDL的负荷均低于0.30,表明对群集形成的影响最小。模型评价支持四聚类解决方案的选择,剪影得分为0.62,聚类间方差占总方差的64%,表明聚类定义良好,具有凝聚力。尽管该框架提高了聚类精度并揭示了临床可操作的模式,但要在现实世界的医疗保健系统中成功实施,必须解决诸如健康数据隐私问题、临床医生在解释PCA结果方面的困难、验证模型在不同人群中的普遍性以及低资源环境中的技术资源限制等挑战。未来的研究应纳入纵向数据,并探索与深度学习等先进模型的融合,以提高预测的准确性和在实时临床环境中的适应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity
This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.
The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m2), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m2), and HOMA-IR (6.05), suggesting a latent risk group.
PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.
The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.
Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Informatics in Medicine Unlocked
Informatics in Medicine Unlocked Medicine-Health Informatics
CiteScore
9.50
自引率
0.00%
发文量
282
审稿时长
39 days
期刊介绍: Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信