Machine Learning-Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan.

IF 5.8 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Seung Ha Hwang, Hayeon Lee, Jun Hyuk Lee, Myeongcheol Lee, Ai Koyanagi, Lee Smith, Sang Youl Rhee, Dong Keon Yon, Jinseok Lee
{"title":"Machine Learning-Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan.","authors":"Seung Ha Hwang, Hayeon Lee, Jun Hyuk Lee, Myeongcheol Lee, Ai Koyanagi, Lee Smith, Sang Youl Rhee, Dong Keon Yon, Jinseok Lee","doi":"10.2196/52794","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Worldwide, cardiovascular diseases are the primary cause of death, with hypertension as a key contributor. In 2019, cardiovascular diseases led to 17.9 million deaths, predicted to reach 23 million by 2030.</p><p><strong>Objective: </strong>This study presents a new method to predict hypertension using demographic data, using 6 machine learning models for enhanced reliability and applicability. The goal is to harness artificial intelligence for early and accurate hypertension diagnosis across diverse populations.</p><p><strong>Methods: </strong>Data from 2 national cohort studies, National Health Insurance Service-National Sample Cohort (South Korea, n=244,814), conducted between 2002 and 2013 were used to train and test machine learning models designed to anticipate incident hypertension within 5 years of a health checkup involving those aged ≥20 years, and Japanese Medical Data Center cohort (Japan, n=1,296,649) were used for extra validation. An ensemble from 6 diverse machine learning models was used to identify the 5 most salient features contributing to hypertension by presenting a feature importance analysis to confirm the contribution of each future.</p><p><strong>Results: </strong>The Adaptive Boosting and logistic regression ensemble showed superior balanced accuracy (0.812, sensitivity 0.806, specificity 0.818, and area under the receiver operating characteristic curve 0.901). The 5 key hypertension indicators were age, diastolic blood pressure, BMI, systolic blood pressure, and fasting blood glucose. The Japanese Medical Data Center cohort dataset (extra validation set) corroborated these findings (balanced accuracy 0.741 and area under the receiver operating characteristic curve 0.824). The ensemble model was integrated into a public web portal for predicting hypertension onset based on health checkup data.</p><p><strong>Conclusions: </strong>Comparative evaluation of our machine learning models against classical statistical models across 2 distinct studies emphasized the former's enhanced stability, generalizability, and reproducibility in predicting hypertension onset.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"26 ","pages":"e52794"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576616/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/52794","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Worldwide, cardiovascular diseases are the primary cause of death, with hypertension as a key contributor. In 2019, cardiovascular diseases led to 17.9 million deaths, predicted to reach 23 million by 2030.

Objective: This study presents a new method to predict hypertension using demographic data, using 6 machine learning models for enhanced reliability and applicability. The goal is to harness artificial intelligence for early and accurate hypertension diagnosis across diverse populations.

Methods: Data from 2 national cohort studies, National Health Insurance Service-National Sample Cohort (South Korea, n=244,814), conducted between 2002 and 2013 were used to train and test machine learning models designed to anticipate incident hypertension within 5 years of a health checkup involving those aged ≥20 years, and Japanese Medical Data Center cohort (Japan, n=1,296,649) were used for extra validation. An ensemble from 6 diverse machine learning models was used to identify the 5 most salient features contributing to hypertension by presenting a feature importance analysis to confirm the contribution of each future.

Results: The Adaptive Boosting and logistic regression ensemble showed superior balanced accuracy (0.812, sensitivity 0.806, specificity 0.818, and area under the receiver operating characteristic curve 0.901). The 5 key hypertension indicators were age, diastolic blood pressure, BMI, systolic blood pressure, and fasting blood glucose. The Japanese Medical Data Center cohort dataset (extra validation set) corroborated these findings (balanced accuracy 0.741 and area under the receiver operating characteristic curve 0.824). The ensemble model was integrated into a public web portal for predicting hypertension onset based on health checkup data.

Conclusions: Comparative evaluation of our machine learning models against classical statistical models across 2 distinct studies emphasized the former's enhanced stability, generalizability, and reproducibility in predicting hypertension onset.

基于定期健康检查数据的机器学习预测高血压发病率:在韩国和日本两个独立的全国性队列中的推导和验证。
背景:在全球范围内,心血管疾病是导致死亡的主要原因,而高血压是其中的主要因素。2019 年,心血管疾病导致 1790 万人死亡,预计到 2030 年将达到 2300 万人:本研究提出了一种利用人口统计数据预测高血压的新方法,使用 6 个机器学习模型来提高可靠性和适用性。目的是利用人工智能对不同人群进行早期、准确的高血压诊断:方法: 我们利用 2002 年至 2013 年期间开展的两项全国性队列研究的数据,即国民健康保险服务-全国抽样队列(韩国,n=244,814),来训练和测试机器学习模型,这些模型旨在预测年龄≥20 岁的人在健康检查后 5 年内发生的高血压,并利用日本医疗数据中心队列(日本,n=1,296,649)进行额外验证。通过对特征重要性进行分析,确认了未来每个特征的贡献,从而从6个不同的机器学习模型中找出了导致高血压的5个最显著特征:结果:自适应提升和逻辑回归集合显示出更高的平衡准确性(0.812,灵敏度0.806,特异性0.818,接收者操作特征曲线下面积0.901)。5 个关键的高血压指标是年龄、舒张压、体重指数、收缩压和空腹血糖。日本医疗数据中心队列数据集(额外验证集)证实了这些发现(平衡准确度为 0.741,接收器操作特征曲线下面积为 0.824)。该集合模型被整合到一个公共门户网站中,用于根据健康检查数据预测高血压发病:在两项不同的研究中,我们的机器学习模型与经典统计模型进行了比较评估,结果表明,前者在预测高血压发病方面具有更强的稳定性、通用性和可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
14.40
自引率
5.40%
发文量
654
审稿时长
1 months
期刊介绍: The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信