评估机器学习算法预测泰国年轻男男性行为者的艾滋病毒状况。

IF 4.1 Q1 HEALTH CARE SCIENCES & SERVICES
Krittaka Soha, Sadiporn Phuthomdee, Thanapat Srichai, Lanchakorn Kittiratanawasin, Win Min Han, Sirinya Teeraananchai
{"title":"评估机器学习算法预测泰国年轻男男性行为者的艾滋病毒状况。","authors":"Krittaka Soha, Sadiporn Phuthomdee, Thanapat Srichai, Lanchakorn Kittiratanawasin, Win Min Han, Sirinya Teeraananchai","doi":"10.1136/bmjhci-2024-101189","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models to predict HIV status and assessed the factors associated with HIV infection among young men who have sex with men (MSM) under the Universal Health Coverage (UHC) programme in Thailand.</p><p><strong>Methods: </strong>Young MSM aged 15-24 years who underwent HIV testing through the UHC programme from 2015 to 2022 were included. Data were divided into training (70%) and testing (30%) sets, with the Synthetic Minority Oversampling Technique (SMOTE) applied to address data set imbalance. ML models, including logistic regression, k-nearest neighbour (KNN), random forest, extreme gradient boosting (XGB) and AdaBoost, were used to predict HIV infection.</p><p><strong>Results: </strong>Among 146 813 young MSM, 11% were diagnosed with HIV. While KNN initially outperformed other ML models, the sensitivity of all models using the original data set was low due to imbalanced data. After applying SMOTE, the XGB model showed the best performance with an accuracy of 0.72, sensitivity of 0.73, specificity of 0.72 and the area under the curve of 0.72. The top predictors of HIV infection were the year of HIV testing (68%), age (55%) and targeted HIV testing (54%).</p><p><strong>Discussion: </strong>This study demonstrates the potential of ML models, particularly XGB, in predicting HIV infection among young MSM in Thailand under the UHC programme. The application of SMOTE improved model sensitivity, addressing data imbalance and enhancing predictive accuracy.</p><p><strong>Conclusions: </strong>ML models have the potential to enhance HIV risk assessment and inform targeted prevention strategies for high-risk populations.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083282/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating machine learning algorithms for predicting HIV status among young Thai men who have sex with men.\",\"authors\":\"Krittaka Soha, Sadiporn Phuthomdee, Thanapat Srichai, Lanchakorn Kittiratanawasin, Win Min Han, Sirinya Teeraananchai\",\"doi\":\"10.1136/bmjhci-2024-101189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models to predict HIV status and assessed the factors associated with HIV infection among young men who have sex with men (MSM) under the Universal Health Coverage (UHC) programme in Thailand.</p><p><strong>Methods: </strong>Young MSM aged 15-24 years who underwent HIV testing through the UHC programme from 2015 to 2022 were included. Data were divided into training (70%) and testing (30%) sets, with the Synthetic Minority Oversampling Technique (SMOTE) applied to address data set imbalance. ML models, including logistic regression, k-nearest neighbour (KNN), random forest, extreme gradient boosting (XGB) and AdaBoost, were used to predict HIV infection.</p><p><strong>Results: </strong>Among 146 813 young MSM, 11% were diagnosed with HIV. While KNN initially outperformed other ML models, the sensitivity of all models using the original data set was low due to imbalanced data. After applying SMOTE, the XGB model showed the best performance with an accuracy of 0.72, sensitivity of 0.73, specificity of 0.72 and the area under the curve of 0.72. The top predictors of HIV infection were the year of HIV testing (68%), age (55%) and targeted HIV testing (54%).</p><p><strong>Discussion: </strong>This study demonstrates the potential of ML models, particularly XGB, in predicting HIV infection among young MSM in Thailand under the UHC programme. The application of SMOTE improved model sensitivity, addressing data imbalance and enhancing predictive accuracy.</p><p><strong>Conclusions: </strong>ML models have the potential to enhance HIV risk assessment and inform targeted prevention strategies for high-risk populations.</p>\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083282/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2024-101189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在开发机器学习(ML)模型来预测艾滋病毒状况,并评估泰国全民健康覆盖(UHC)计划下男男性行为者(MSM)中与艾滋病毒感染相关的因素。方法:纳入2015年至2022年通过全民健康覆盖规划进行艾滋病毒检测的15-24岁年轻男男性接触者。将数据分为训练集(70%)和测试集(30%),采用合成少数派过采样技术(SMOTE)解决数据集不平衡问题。ML模型,包括逻辑回归、k近邻(KNN)、随机森林、极端梯度增强(XGB)和AdaBoost,用于预测HIV感染。结果:在146 813名年轻男男性行为者中,11%的人被诊断为艾滋病毒携带者。虽然KNN最初优于其他ML模型,但由于数据不平衡,使用原始数据集的所有模型的灵敏度都很低。应用SMOTE后,XGB模型的准确率为0.72,灵敏度为0.73,特异度为0.72,曲线下面积为0.72,表现出最佳性能。艾滋病毒感染的主要预测因素是艾滋病毒检测的年份(68%)、年龄(55%)和靶向艾滋病毒检测(54%)。讨论:本研究表明,在全民健康覆盖计划下,ML模型,特别是XGB,在预测泰国年轻男同性恋者中艾滋病毒感染方面具有潜力。SMOTE的应用提高了模型的灵敏度,解决了数据不平衡问题,提高了预测精度。结论:ML模型有可能增强HIV风险评估,并为高危人群提供有针对性的预防策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating machine learning algorithms for predicting HIV status among young Thai men who have sex with men.

Objective: This study aimed to develop machine learning (ML) models to predict HIV status and assessed the factors associated with HIV infection among young men who have sex with men (MSM) under the Universal Health Coverage (UHC) programme in Thailand.

Methods: Young MSM aged 15-24 years who underwent HIV testing through the UHC programme from 2015 to 2022 were included. Data were divided into training (70%) and testing (30%) sets, with the Synthetic Minority Oversampling Technique (SMOTE) applied to address data set imbalance. ML models, including logistic regression, k-nearest neighbour (KNN), random forest, extreme gradient boosting (XGB) and AdaBoost, were used to predict HIV infection.

Results: Among 146 813 young MSM, 11% were diagnosed with HIV. While KNN initially outperformed other ML models, the sensitivity of all models using the original data set was low due to imbalanced data. After applying SMOTE, the XGB model showed the best performance with an accuracy of 0.72, sensitivity of 0.73, specificity of 0.72 and the area under the curve of 0.72. The top predictors of HIV infection were the year of HIV testing (68%), age (55%) and targeted HIV testing (54%).

Discussion: This study demonstrates the potential of ML models, particularly XGB, in predicting HIV infection among young MSM in Thailand under the UHC programme. The application of SMOTE improved model sensitivity, addressing data imbalance and enhancing predictive accuracy.

Conclusions: ML models have the potential to enhance HIV risk assessment and inform targeted prevention strategies for high-risk populations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
4.90%
发文量
40
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信