C Mahony Reategui-Rivera, Wanting Cui, Stefan Escobar-Agreda, Leonardo Rojas-Mezarina, Joseph Finkelstein
{"title":"基于机器学习的缺席远程医疗预测。","authors":"C Mahony Reategui-Rivera, Wanting Cui, Stefan Escobar-Agreda, Leonardo Rojas-Mezarina, Joseph Finkelstein","doi":"10.1089/tmr.2025.0009","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>This study aimed to evaluate the performance of machine learning (ML) models in predicting patient no-shows for telemedicine appointments within Peruvian health system and identify key predictors of nonattendance.</p><p><strong>Methods: </strong>We performed a retrospective observational study using anonymized data (June 2019-November 2023) from \"Teleatiendo.\" The dataset included over 1.5 million completed appointments and about 64,000 no-shows (4.1%), focusing on teleorientation and telemonitoring. Predictor variables included patient demographics, socioeconomic factors, health care facility characteristics, appointment timing, and telemedicine service types. A 70% training, 10% validation, and 20% testing split were used over 10 iterations, with hyperparameter tuning performed on the validation set to identify optimal model parameters. Multiple ML approaches-random forest, XGBoost, LightGBM, and anomaly detection-were implemented in combination with undersampling and cost-sensitive learning to address class imbalance. Performance was evaluated using precision, recall, specificity, area under the curve (AUC), F1-score, and accuracy.</p><p><strong>Results: </strong>Of the models tested, undersampling with XGBoost achieved a precision of 0.115 (±0.001), recall of 0.654 (±0.005), specificity of 0.786 (±0.002), AUC of 0.720 (±0.002), and accuracy of 0.780 (±0.002). In contrast, cost-sensitive XGBoost exhibited a balanced performance with a precision of 0.123 (±0.001), recall of 0.639 (±0.006), specificity of 0.805 (±0.004), AUC of 0.722 (±0.001), and accuracy of 0.799 (±0.003). Additionally, cost-sensitive random forest achieved the highest specificity (0.843 ± 0.002) and accuracy (0.832 ± 0.001) but recorded a lower recall (0.585 ± 0.004), while cost-sensitive LightGBM and balanced random forest yielded performance metrics similar to cost-sensitive XGBoost. Isolation forest, used for abnormality detection, demonstrated the lowest performance.</p><p><strong>Conclusions: </strong>ML models can moderately predict telemedicine no-shows in Peru, with cost-sensitive boosting techniques enhancing the identification of high-risk patients. Key predictors reflect both individual behavior and system-level contexts, suggesting the need for tailored, context-specific interventions. These findings can inform targeted strategies to optimize telemedicine, improve appointment adherence, and promote equitable health care access.</p>","PeriodicalId":94218,"journal":{"name":"Telemedicine reports","volume":"6 1","pages":"109-119"},"PeriodicalIF":1.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235123/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Based Prediction of No-Show Telemedicine Encounters.\",\"authors\":\"C Mahony Reategui-Rivera, Wanting Cui, Stefan Escobar-Agreda, Leonardo Rojas-Mezarina, Joseph Finkelstein\",\"doi\":\"10.1089/tmr.2025.0009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aim: </strong>This study aimed to evaluate the performance of machine learning (ML) models in predicting patient no-shows for telemedicine appointments within Peruvian health system and identify key predictors of nonattendance.</p><p><strong>Methods: </strong>We performed a retrospective observational study using anonymized data (June 2019-November 2023) from \\\"Teleatiendo.\\\" The dataset included over 1.5 million completed appointments and about 64,000 no-shows (4.1%), focusing on teleorientation and telemonitoring. Predictor variables included patient demographics, socioeconomic factors, health care facility characteristics, appointment timing, and telemedicine service types. A 70% training, 10% validation, and 20% testing split were used over 10 iterations, with hyperparameter tuning performed on the validation set to identify optimal model parameters. Multiple ML approaches-random forest, XGBoost, LightGBM, and anomaly detection-were implemented in combination with undersampling and cost-sensitive learning to address class imbalance. Performance was evaluated using precision, recall, specificity, area under the curve (AUC), F1-score, and accuracy.</p><p><strong>Results: </strong>Of the models tested, undersampling with XGBoost achieved a precision of 0.115 (±0.001), recall of 0.654 (±0.005), specificity of 0.786 (±0.002), AUC of 0.720 (±0.002), and accuracy of 0.780 (±0.002). In contrast, cost-sensitive XGBoost exhibited a balanced performance with a precision of 0.123 (±0.001), recall of 0.639 (±0.006), specificity of 0.805 (±0.004), AUC of 0.722 (±0.001), and accuracy of 0.799 (±0.003). Additionally, cost-sensitive random forest achieved the highest specificity (0.843 ± 0.002) and accuracy (0.832 ± 0.001) but recorded a lower recall (0.585 ± 0.004), while cost-sensitive LightGBM and balanced random forest yielded performance metrics similar to cost-sensitive XGBoost. Isolation forest, used for abnormality detection, demonstrated the lowest performance.</p><p><strong>Conclusions: </strong>ML models can moderately predict telemedicine no-shows in Peru, with cost-sensitive boosting techniques enhancing the identification of high-risk patients. Key predictors reflect both individual behavior and system-level contexts, suggesting the need for tailored, context-specific interventions. These findings can inform targeted strategies to optimize telemedicine, improve appointment adherence, and promote equitable health care access.</p>\",\"PeriodicalId\":94218,\"journal\":{\"name\":\"Telemedicine reports\",\"volume\":\"6 1\",\"pages\":\"109-119\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235123/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Telemedicine reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1089/tmr.2025.0009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telemedicine reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1089/tmr.2025.0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Machine Learning-Based Prediction of No-Show Telemedicine Encounters.
Aim: This study aimed to evaluate the performance of machine learning (ML) models in predicting patient no-shows for telemedicine appointments within Peruvian health system and identify key predictors of nonattendance.
Methods: We performed a retrospective observational study using anonymized data (June 2019-November 2023) from "Teleatiendo." The dataset included over 1.5 million completed appointments and about 64,000 no-shows (4.1%), focusing on teleorientation and telemonitoring. Predictor variables included patient demographics, socioeconomic factors, health care facility characteristics, appointment timing, and telemedicine service types. A 70% training, 10% validation, and 20% testing split were used over 10 iterations, with hyperparameter tuning performed on the validation set to identify optimal model parameters. Multiple ML approaches-random forest, XGBoost, LightGBM, and anomaly detection-were implemented in combination with undersampling and cost-sensitive learning to address class imbalance. Performance was evaluated using precision, recall, specificity, area under the curve (AUC), F1-score, and accuracy.
Results: Of the models tested, undersampling with XGBoost achieved a precision of 0.115 (±0.001), recall of 0.654 (±0.005), specificity of 0.786 (±0.002), AUC of 0.720 (±0.002), and accuracy of 0.780 (±0.002). In contrast, cost-sensitive XGBoost exhibited a balanced performance with a precision of 0.123 (±0.001), recall of 0.639 (±0.006), specificity of 0.805 (±0.004), AUC of 0.722 (±0.001), and accuracy of 0.799 (±0.003). Additionally, cost-sensitive random forest achieved the highest specificity (0.843 ± 0.002) and accuracy (0.832 ± 0.001) but recorded a lower recall (0.585 ± 0.004), while cost-sensitive LightGBM and balanced random forest yielded performance metrics similar to cost-sensitive XGBoost. Isolation forest, used for abnormality detection, demonstrated the lowest performance.
Conclusions: ML models can moderately predict telemedicine no-shows in Peru, with cost-sensitive boosting techniques enhancing the identification of high-risk patients. Key predictors reflect both individual behavior and system-level contexts, suggesting the need for tailored, context-specific interventions. These findings can inform targeted strategies to optimize telemedicine, improve appointment adherence, and promote equitable health care access.