一种机器学习方法预测高血压使用横断面和两年随访数据来自印度东北部阿萨姆邦的健康和人口统计队列。

IF 2.5 4区医学 Q3 IMMUNOLOGY

Indian Journal of Medical Research Pub Date : 2025-04-01 DOI:10.25259/IJMR_881_2024

Krishnarjun Bora, Natarajaseenivasan Kalimuthusamy, Ananya Jyoti Gogoi, Namita Garh, Manisha Rabidas, Gargi Chanda, Rajshree Das, Prasanta Kumar Borah

{"title":"一种机器学习方法预测高血压使用横断面和两年随访数据来自印度东北部阿萨姆邦的健康和人口统计队列。","authors":"Krishnarjun Bora, Natarajaseenivasan Kalimuthusamy, Ananya Jyoti Gogoi, Namita Garh, Manisha Rabidas, Gargi Chanda, Rajshree Das, Prasanta Kumar Borah","doi":"10.25259/IJMR_881_2024","DOIUrl":null,"url":null,"abstract":"Background & objectives Hypertension affects a sizable section of the world population and is being recognised as a growing problem. Its prediction using machine learning (ML) algorithms, will add to its control and prevention. The objective of the present investigation was to check the applicability of ML approaches in the prediction and detection of hypertension. Methods We included 53,301 participants at baseline from a health and demographic surveillance system in Dibrugarh, Assam (Dibrugarh-HDSS). We constructed two models, one at baseline and the other after two years of follow-up. Of the total participants (baseline: 29,402; follow up: 4,400), 70 per cent were randomly selected to fit seven popular classification models namely decision tree classifier (DTC), random forest classifier (RFC), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression, Ada-boost classifier, and XG boost classifier. The data from the remaining 30 per cent were used to evaluate the performance of the models. Results In the baseline data, the Ada-boost classifier could identify hypertension with a maximum accuracy score of 87.02 per cent (CI: 86.01-88.03). The maximum area under the curve (AUC) score of 98.37 per cent (CI: 97.36-99.38) was obtained under RFC. For the prediction of risk at two years, the maximum average accuracy score of 77.57 per cent (CI: 76.6-78.54) was achieved under X-G Boost followed by RFC (77.2%, CI: 76.15-78.25) and a maximum AUC of (85.82%, CI: 84.88-86.76) was obtained under RFC. Interpretation & conclusions In both the identification and prediction of hypertension, RFC was found to be better than the other classifiers. 'Waist circumference' followed by 'body mass index' (BMI) were found to have maximum relative importance in the identification of hypertension, while in the case of two-year risk prediction, the baseline 'systolic blood pressure' (SBP), diastolic blood pressure (DBP), and 'BMI' had the maximum relative importance. The findings revealed the potential of predictive models in accurately identifying high-risk individuals, enabling timely interventions, and optimising clinical decision-making.","PeriodicalId":13349,"journal":{"name":"Indian Journal of Medical Research","volume":"161 4","pages":"394-405"},"PeriodicalIF":2.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12178201/pdf/","citationCount":"0","resultStr":"{\"title\":\"A machine learning approach to predict hypertension using cross-sectional & two years follow up data from a health & demographic cohort of Assam, North East India.\",\"authors\":\"Krishnarjun Bora, Natarajaseenivasan Kalimuthusamy, Ananya Jyoti Gogoi, Namita Garh, Manisha Rabidas, Gargi Chanda, Rajshree Das, Prasanta Kumar Borah\",\"doi\":\"10.25259/IJMR_881_2024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background & objectives Hypertension affects a sizable section of the world population and is being recognised as a growing problem. Its prediction using machine learning (ML) algorithms, will add to its control and prevention. The objective of the present investigation was to check the applicability of ML approaches in the prediction and detection of hypertension. Methods We included 53,301 participants at baseline from a health and demographic surveillance system in Dibrugarh, Assam (Dibrugarh-HDSS). We constructed two models, one at baseline and the other after two years of follow-up. Of the total participants (baseline: 29,402; follow up: 4,400), 70 per cent were randomly selected to fit seven popular classification models namely decision tree classifier (DTC), random forest classifier (RFC), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression, Ada-boost classifier, and XG boost classifier. The data from the remaining 30 per cent were used to evaluate the performance of the models. Results In the baseline data, the Ada-boost classifier could identify hypertension with a maximum accuracy score of 87.02 per cent (CI: 86.01-88.03). The maximum area under the curve (AUC) score of 98.37 per cent (CI: 97.36-99.38) was obtained under RFC. For the prediction of risk at two years, the maximum average accuracy score of 77.57 per cent (CI: 76.6-78.54) was achieved under X-G Boost followed by RFC (77.2%, CI: 76.15-78.25) and a maximum AUC of (85.82%, CI: 84.88-86.76) was obtained under RFC. Interpretation & conclusions In both the identification and prediction of hypertension, RFC was found to be better than the other classifiers. 'Waist circumference' followed by 'body mass index' (BMI) were found to have maximum relative importance in the identification of hypertension, while in the case of two-year risk prediction, the baseline 'systolic blood pressure' (SBP), diastolic blood pressure (DBP), and 'BMI' had the maximum relative importance. The findings revealed the potential of predictive models in accurately identifying high-risk individuals, enabling timely interventions, and optimising clinical decision-making.\",\"PeriodicalId\":13349,\"journal\":{\"name\":\"Indian Journal of Medical Research\",\"volume\":\"161 4\",\"pages\":\"394-405\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12178201/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Indian Journal of Medical Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.25259/IJMR_881_2024\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"IMMUNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian Journal of Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.25259/IJMR_881_2024","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMMUNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景与目的高血压影响着世界上相当大一部分人口，并被认为是一个日益严重的问题。它使用机器学习（ML）算法进行预测，将增加其控制和预防。本研究的目的是检查ML方法在高血压预测和检测中的适用性。方法我们纳入了53,301名来自阿萨姆邦Dibrugarh （Dibrugarh- hdss）健康和人口监测系统的基线参与者。我们构建了两个模型，一个在基线，另一个经过两年的随访。在总参与者中(基线：29,402；随访：4400)，其中70%随机选择拟合七种流行的分类模型，即决策树分类器（DTC）、随机森林分类器（RFC）、支持向量机（SVM）、线性判别分析（LDA）、逻辑回归、Ada-boost分类器和XG boost分类器。其余30%的数据被用来评估模型的性能。结果在基线数据中，Ada-boost分类器识别高血压的最高准确率评分为87.02% （CI: 86.01-88.03）。RFC下的最大曲线下面积（AUC）评分为98.37% （CI: 97.36 ~ 99.38）。对于2年风险的预测，X-G Boost下的最高平均准确率为77.57% （CI: 76.6-78.54)，其次是RFC (77.2%, CI: 76.15-78.25)， RFC下的最大AUC为85.82%,CI: 84.88-86.76）。结论在高血压的识别和预测方面，RFC都优于其他分类器。研究发现，“腰围”和“体重指数”（BMI）在识别高血压方面具有最大的相对重要性，而在两年风险预测的情况下，基线“收缩压”（SBP）、“舒张压”（DBP）和“BMI”具有最大的相对重要性。研究结果揭示了预测模型在准确识别高危个体、及时干预和优化临床决策方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A machine learning approach to predict hypertension using cross-sectional & two years follow up data from a health & demographic cohort of Assam, North East India.

查看原文本刊更多论文

A machine learning approach to predict hypertension using cross-sectional & two years follow up data from a health & demographic cohort of Assam, North East India.

Background & objectives Hypertension affects a sizable section of the world population and is being recognised as a growing problem. Its prediction using machine learning (ML) algorithms, will add to its control and prevention. The objective of the present investigation was to check the applicability of ML approaches in the prediction and detection of hypertension. Methods We included 53,301 participants at baseline from a health and demographic surveillance system in Dibrugarh, Assam (Dibrugarh-HDSS). We constructed two models, one at baseline and the other after two years of follow-up. Of the total participants (baseline: 29,402; follow up: 4,400), 70 per cent were randomly selected to fit seven popular classification models namely decision tree classifier (DTC), random forest classifier (RFC), support vector machine (SVM), linear discriminant analysis (LDA), logistic regression, Ada-boost classifier, and XG boost classifier. The data from the remaining 30 per cent were used to evaluate the performance of the models. Results In the baseline data, the Ada-boost classifier could identify hypertension with a maximum accuracy score of 87.02 per cent (CI: 86.01-88.03). The maximum area under the curve (AUC) score of 98.37 per cent (CI: 97.36-99.38) was obtained under RFC. For the prediction of risk at two years, the maximum average accuracy score of 77.57 per cent (CI: 76.6-78.54) was achieved under X-G Boost followed by RFC (77.2%, CI: 76.15-78.25) and a maximum AUC of (85.82%, CI: 84.88-86.76) was obtained under RFC. Interpretation & conclusions In both the identification and prediction of hypertension, RFC was found to be better than the other classifiers. 'Waist circumference' followed by 'body mass index' (BMI) were found to have maximum relative importance in the identification of hypertension, while in the case of two-year risk prediction, the baseline 'systolic blood pressure' (SBP), diastolic blood pressure (DBP), and 'BMI' had the maximum relative importance. The findings revealed the potential of predictive models in accurately identifying high-risk individuals, enabling timely interventions, and optimising clinical decision-making.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Indian Journal of Medical Research 医学-免疫学

CiteScore

5.80

自引率

2.40%

发文量

191

审稿时长

3-8 weeks

期刊介绍： The Indian Journal of Medical Research (IJMR) [ISSN 0971-5916] is one of the oldest medical Journals not only in India, but probably in Asia, as it started in the year 1913. The Journal was started as a quarterly (4 issues/year) in 1913 and made bimonthly (6 issues/year) in 1958. It became monthly (12 issues/year) in the year 1964.