Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili
{"title":"在公共卫生中使用模糊c均值聚类和PCA:一种对抗心血管疾病和肥胖的机器学习方法","authors":"Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili","doi":"10.1016/j.imu.2025.101666","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.</div><div>The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m<sup>2</sup>), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m<sup>2</sup>), and HOMA-IR (6.05), suggesting a latent risk group.</div><div>PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.</div><div>The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.</div><div>Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"57 ","pages":"Article 101666"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity\",\"authors\":\"Gamal Saad Mohamed Khamis , Nasser S. Alqahtani , Sultan Munadi Alanazi , Mohammed Muharrab Alruwaili , Mariam Shabram Alenazi , Maneaf A. Alrawaili\",\"doi\":\"10.1016/j.imu.2025.101666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.</div><div>The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m<sup>2</sup>), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m<sup>2</sup>), and HOMA-IR (6.05), suggesting a latent risk group.</div><div>PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.</div><div>The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.</div><div>Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.</div></div>\",\"PeriodicalId\":13953,\"journal\":{\"name\":\"Informatics in Medicine Unlocked\",\"volume\":\"57 \",\"pages\":\"Article 101666\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Informatics in Medicine Unlocked\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352914825000541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
Using Fuzzy C-Means clustering and PCA in public health: A machine learning approach to combat CVD and obesity
This study introduces a novel framework that integrates principal component analysis (PCA) with fuzzy c-means (FCM) clustering to enhance the analysis of high-dimensional health data, specifically targeting the identification of at-risk groups for cardiovascular disease (CVD) and obesity. This unique approach, which has not been previously explored in public health, promises to provide new insights and solutions to these pressing health issues.
The proposed PCA-FCM model was applied to a dataset comprising more than 20 health variables from a population sample aged 18–75 years. The analysis identified four distinct clusters, each showing unique risk patterns. For instance, Cluster One (mean age, 29) showed elevated body mass index (BMI) (mean, 33.7 kg/m2), high waist circumference (113 cm), and signs of insulin resistance (FBS, 133 mg/dL; HOMA-IR, 7.12). In contrast, Cluster Two (mean age, 61) exhibited the highest systolic blood pressure (SBP, 143 mmHg), elevated LDL cholesterol (4.27 mmol/L), and triglycerides (2.59 mmol/L), indicating advanced metabolic syndrome. Cluster Three (mean age, 51) presented a healthier metabolic profile with lower HOMA-IR (3.74), normal SBP (127 mmHg), and balanced lipid levels (HDL, 1.36 mmol/L). Cluster Four (mean age, 43) showed elevated SBP (134 mmHg), BMI (32.1 kg/m2), and HOMA-IR (6.05), suggesting a latent risk group.
PCA identified waist circumference, visceral fat, LDL/HDL ratio, non-HDL cholesterol, and waist-to-height ratio as the most influential variables contributing to cluster separation, with loadings above 0.70 on the first two principal components. Meanwhile, exercise, height, family history, and HDL had loadings below 0.30, indicating minimal influence on cluster formation.
The model evaluation supported the selection of the four-cluster solution, with a Silhouette Score of 0.62 and Between-Cluster Variation accounting for 64 % of the total variance, signifying well-defined and cohesive clusters.
Although this framework enhances clustering precision and uncovers clinically actionable patterns, challenges such as health data privacy concerns, clinicians’ difficulty in interpreting PCA results, validating model generalizability across diverse populations, and technical resource limitations in low-resource settings must be addressed for successful implementation in real-world healthcare systems. Future research should incorporate longitudinal data and explore integration with advanced models, such as deep learning, to improve predictive accuracy and adaptability in real-time clinical environments.
期刊介绍:
Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.