Feature Selection for Hypertension Risk Prediction Using XGBoost on Single Nucleotide Polymorphism Data.

IF 2.3 Q3 MEDICAL INFORMATICS
Healthcare Informatics Research Pub Date : 2025-01-01 Epub Date: 2025-01-31 DOI:10.4258/hir.2025.31.1.16
Lailil Muflikhah, Tirana Noor Fatyanosa, Nashi Widodo, Rizal Setya Perdana, Solimun, Hana Ratnawati
{"title":"Feature Selection for Hypertension Risk Prediction Using XGBoost on Single Nucleotide Polymorphism Data.","authors":"Lailil Muflikhah, Tirana Noor Fatyanosa, Nashi Widodo, Rizal Setya Perdana, Solimun, Hana Ratnawati","doi":"10.4258/hir.2025.31.1.16","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Hypertension, commonly known as high blood pressure, is a prevalent and serious condition affecting a significant portion of the adult population globally. It is a chronic medical issue that, if left unaddressed, can lead to severe health complications, including kidney problems, heart disease, and stroke. This study aims to develop a feature selection model using the XGBoost algorithm to identify specific single nucleotide polymorphisms (SNPs) as biomarkers for detecting hypertension risk.</p><p><strong>Methods: </strong>We propose using the high dimensionality of genetic variations (i.e., SNPs) to build a classifier model for prediction. In this study, SNPs were used as markers for hypertension in patients. We utilized the OpenSNP dataset, which includes 19,697 SNPs from 2,052 samples. Extreme gradient boosting (XGBoost) is an ensemble machine learning method employed here for feature selection, which incrementally adjusts weights in a series of steps.</p><p><strong>Results: </strong>The experimental results identified 292 SNPs that exhibited high performance, with an F1-score of 98.55%, precision of 98.73%, recall of 98.38%, and overall accuracy of 98%. This study provides compelling evidence that the XGBoost feature selection method outperforms other representative feature selection methods, such as genetic algorithms, analysis of variance, chi-square, and principal component analysis, in predicting hypertension risk, demonstrating its effectiveness.</p><p><strong>Conclusions: </strong>We developed a model for predicting hypertension using the SNPs dataset. The high dimensionality of SNP data was effectively managed to identify significant features as biomarkers using the XGBoost feature selection method. The results indicate high performance in predicting the risk of hypertension.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"31 1","pages":"16-22"},"PeriodicalIF":2.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11854617/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2025.31.1.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Hypertension, commonly known as high blood pressure, is a prevalent and serious condition affecting a significant portion of the adult population globally. It is a chronic medical issue that, if left unaddressed, can lead to severe health complications, including kidney problems, heart disease, and stroke. This study aims to develop a feature selection model using the XGBoost algorithm to identify specific single nucleotide polymorphisms (SNPs) as biomarkers for detecting hypertension risk.

Methods: We propose using the high dimensionality of genetic variations (i.e., SNPs) to build a classifier model for prediction. In this study, SNPs were used as markers for hypertension in patients. We utilized the OpenSNP dataset, which includes 19,697 SNPs from 2,052 samples. Extreme gradient boosting (XGBoost) is an ensemble machine learning method employed here for feature selection, which incrementally adjusts weights in a series of steps.

Results: The experimental results identified 292 SNPs that exhibited high performance, with an F1-score of 98.55%, precision of 98.73%, recall of 98.38%, and overall accuracy of 98%. This study provides compelling evidence that the XGBoost feature selection method outperforms other representative feature selection methods, such as genetic algorithms, analysis of variance, chi-square, and principal component analysis, in predicting hypertension risk, demonstrating its effectiveness.

Conclusions: We developed a model for predicting hypertension using the SNPs dataset. The high dimensionality of SNP data was effectively managed to identify significant features as biomarkers using the XGBoost feature selection method. The results indicate high performance in predicting the risk of hypertension.

基于单核苷酸多态性数据的XGBoost高血压风险预测特征选择
目的:高血压,俗称高血压,是一种普遍而严重的疾病,影响着全球很大一部分成年人。这是一个长期的医学问题,如果不加以解决,可能会导致严重的健康并发症,包括肾脏问题、心脏病和中风。本研究旨在利用XGBoost算法建立一个特征选择模型,以识别特定的单核苷酸多态性(snp)作为检测高血压风险的生物标志物。方法:提出利用遗传变异(即snp)的高维数建立分类器模型进行预测。在本研究中,snp被用作高血压患者的标志物。我们使用了OpenSNP数据集,其中包括来自2,052个样本的19,697个snp。极限梯度增强(XGBoost)是一种用于特征选择的集成机器学习方法,它通过一系列步骤逐步调整权重。结果:实验结果鉴定出292个高效snp, f1得分为98.55%,准确率为98.73%,召回率为98.38%,总体准确率为98%。本研究提供了令人信服的证据,证明XGBoost特征选择方法在预测高血压风险方面优于其他代表性的特征选择方法,如遗传算法、方差分析、卡方分析和主成分分析,证明了其有效性。结论:我们开发了一个使用snp数据集预测高血压的模型。使用XGBoost特征选择方法有效地管理SNP数据的高维数,以识别作为生物标志物的重要特征。结果表明,该方法在预测高血压风险方面具有较高的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Healthcare Informatics Research
Healthcare Informatics Research MEDICAL INFORMATICS-
CiteScore
4.90
自引率
6.90%
发文量
44
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信