{"title":"评估机器学习算法预测泰国年轻男男性行为者的艾滋病毒状况。","authors":"Krittaka Soha, Sadiporn Phuthomdee, Thanapat Srichai, Lanchakorn Kittiratanawasin, Win Min Han, Sirinya Teeraananchai","doi":"10.1136/bmjhci-2024-101189","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models to predict HIV status and assessed the factors associated with HIV infection among young men who have sex with men (MSM) under the Universal Health Coverage (UHC) programme in Thailand.</p><p><strong>Methods: </strong>Young MSM aged 15-24 years who underwent HIV testing through the UHC programme from 2015 to 2022 were included. Data were divided into training (70%) and testing (30%) sets, with the Synthetic Minority Oversampling Technique (SMOTE) applied to address data set imbalance. ML models, including logistic regression, k-nearest neighbour (KNN), random forest, extreme gradient boosting (XGB) and AdaBoost, were used to predict HIV infection.</p><p><strong>Results: </strong>Among 146 813 young MSM, 11% were diagnosed with HIV. While KNN initially outperformed other ML models, the sensitivity of all models using the original data set was low due to imbalanced data. After applying SMOTE, the XGB model showed the best performance with an accuracy of 0.72, sensitivity of 0.73, specificity of 0.72 and the area under the curve of 0.72. The top predictors of HIV infection were the year of HIV testing (68%), age (55%) and targeted HIV testing (54%).</p><p><strong>Discussion: </strong>This study demonstrates the potential of ML models, particularly XGB, in predicting HIV infection among young MSM in Thailand under the UHC programme. The application of SMOTE improved model sensitivity, addressing data imbalance and enhancing predictive accuracy.</p><p><strong>Conclusions: </strong>ML models have the potential to enhance HIV risk assessment and inform targeted prevention strategies for high-risk populations.</p>","PeriodicalId":9050,"journal":{"name":"BMJ Health & Care Informatics","volume":"32 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083282/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating machine learning algorithms for predicting HIV status among young Thai men who have sex with men.\",\"authors\":\"Krittaka Soha, Sadiporn Phuthomdee, Thanapat Srichai, Lanchakorn Kittiratanawasin, Win Min Han, Sirinya Teeraananchai\",\"doi\":\"10.1136/bmjhci-2024-101189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models to predict HIV status and assessed the factors associated with HIV infection among young men who have sex with men (MSM) under the Universal Health Coverage (UHC) programme in Thailand.</p><p><strong>Methods: </strong>Young MSM aged 15-24 years who underwent HIV testing through the UHC programme from 2015 to 2022 were included. Data were divided into training (70%) and testing (30%) sets, with the Synthetic Minority Oversampling Technique (SMOTE) applied to address data set imbalance. ML models, including logistic regression, k-nearest neighbour (KNN), random forest, extreme gradient boosting (XGB) and AdaBoost, were used to predict HIV infection.</p><p><strong>Results: </strong>Among 146 813 young MSM, 11% were diagnosed with HIV. While KNN initially outperformed other ML models, the sensitivity of all models using the original data set was low due to imbalanced data. After applying SMOTE, the XGB model showed the best performance with an accuracy of 0.72, sensitivity of 0.73, specificity of 0.72 and the area under the curve of 0.72. The top predictors of HIV infection were the year of HIV testing (68%), age (55%) and targeted HIV testing (54%).</p><p><strong>Discussion: </strong>This study demonstrates the potential of ML models, particularly XGB, in predicting HIV infection among young MSM in Thailand under the UHC programme. The application of SMOTE improved model sensitivity, addressing data imbalance and enhancing predictive accuracy.</p><p><strong>Conclusions: </strong>ML models have the potential to enhance HIV risk assessment and inform targeted prevention strategies for high-risk populations.</p>\",\"PeriodicalId\":9050,\"journal\":{\"name\":\"BMJ Health & Care Informatics\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12083282/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ Health & Care Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjhci-2024-101189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Health & Care Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjhci-2024-101189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Evaluating machine learning algorithms for predicting HIV status among young Thai men who have sex with men.
Objective: This study aimed to develop machine learning (ML) models to predict HIV status and assessed the factors associated with HIV infection among young men who have sex with men (MSM) under the Universal Health Coverage (UHC) programme in Thailand.
Methods: Young MSM aged 15-24 years who underwent HIV testing through the UHC programme from 2015 to 2022 were included. Data were divided into training (70%) and testing (30%) sets, with the Synthetic Minority Oversampling Technique (SMOTE) applied to address data set imbalance. ML models, including logistic regression, k-nearest neighbour (KNN), random forest, extreme gradient boosting (XGB) and AdaBoost, were used to predict HIV infection.
Results: Among 146 813 young MSM, 11% were diagnosed with HIV. While KNN initially outperformed other ML models, the sensitivity of all models using the original data set was low due to imbalanced data. After applying SMOTE, the XGB model showed the best performance with an accuracy of 0.72, sensitivity of 0.73, specificity of 0.72 and the area under the curve of 0.72. The top predictors of HIV infection were the year of HIV testing (68%), age (55%) and targeted HIV testing (54%).
Discussion: This study demonstrates the potential of ML models, particularly XGB, in predicting HIV infection among young MSM in Thailand under the UHC programme. The application of SMOTE improved model sensitivity, addressing data imbalance and enhancing predictive accuracy.
Conclusions: ML models have the potential to enhance HIV risk assessment and inform targeted prevention strategies for high-risk populations.