基于机器学习的缺席远程医疗预测。

IF 1.6 Q3 HEALTH CARE SCIENCES & SERVICES

Telemedicine reports Pub Date : 2025-04-07 eCollection Date: 2025-01-01 DOI:10.1089/tmr.2025.0009

C Mahony Reategui-Rivera, Wanting Cui, Stefan Escobar-Agreda, Leonardo Rojas-Mezarina, Joseph Finkelstein

{"title":"基于机器学习的缺席远程医疗预测。","authors":"C Mahony Reategui-Rivera, Wanting Cui, Stefan Escobar-Agreda, Leonardo Rojas-Mezarina, Joseph Finkelstein","doi":"10.1089/tmr.2025.0009","DOIUrl":null,"url":null,"abstract":"Aim: This study aimed to evaluate the performance of machine learning (ML) models in predicting patient no-shows for telemedicine appointments within Peruvian health system and identify key predictors of nonattendance.Methods: We performed a retrospective observational study using anonymized data (June 2019-November 2023) from \"Teleatiendo.\" The dataset included over 1.5 million completed appointments and about 64,000 no-shows (4.1%), focusing on teleorientation and telemonitoring. Predictor variables included patient demographics, socioeconomic factors, health care facility characteristics, appointment timing, and telemedicine service types. A 70% training, 10% validation, and 20% testing split were used over 10 iterations, with hyperparameter tuning performed on the validation set to identify optimal model parameters. Multiple ML approaches-random forest, XGBoost, LightGBM, and anomaly detection-were implemented in combination with undersampling and cost-sensitive learning to address class imbalance. Performance was evaluated using precision, recall, specificity, area under the curve (AUC), F1-score, and accuracy.Results: Of the models tested, undersampling with XGBoost achieved a precision of 0.115 (±0.001), recall of 0.654 (±0.005), specificity of 0.786 (±0.002), AUC of 0.720 (±0.002), and accuracy of 0.780 (±0.002). In contrast, cost-sensitive XGBoost exhibited a balanced performance with a precision of 0.123 (±0.001), recall of 0.639 (±0.006), specificity of 0.805 (±0.004), AUC of 0.722 (±0.001), and accuracy of 0.799 (±0.003). Additionally, cost-sensitive random forest achieved the highest specificity (0.843 ± 0.002) and accuracy (0.832 ± 0.001) but recorded a lower recall (0.585 ± 0.004), while cost-sensitive LightGBM and balanced random forest yielded performance metrics similar to cost-sensitive XGBoost. Isolation forest, used for abnormality detection, demonstrated the lowest performance.Conclusions: ML models can moderately predict telemedicine no-shows in Peru, with cost-sensitive boosting techniques enhancing the identification of high-risk patients. Key predictors reflect both individual behavior and system-level contexts, suggesting the need for tailored, context-specific interventions. These findings can inform targeted strategies to optimize telemedicine, improve appointment adherence, and promote equitable health care access.","PeriodicalId":94218,"journal":{"name":"Telemedicine reports","volume":"6 1","pages":"109-119"},"PeriodicalIF":1.6000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235123/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine Learning-Based Prediction of No-Show Telemedicine Encounters.\",\"authors\":\"C Mahony Reategui-Rivera, Wanting Cui, Stefan Escobar-Agreda, Leonardo Rojas-Mezarina, Joseph Finkelstein\",\"doi\":\"10.1089/tmr.2025.0009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aim: This study aimed to evaluate the performance of machine learning (ML) models in predicting patient no-shows for telemedicine appointments within Peruvian health system and identify key predictors of nonattendance.Methods: We performed a retrospective observational study using anonymized data (June 2019-November 2023) from \\\"Teleatiendo.\\\" The dataset included over 1.5 million completed appointments and about 64,000 no-shows (4.1%), focusing on teleorientation and telemonitoring. Predictor variables included patient demographics, socioeconomic factors, health care facility characteristics, appointment timing, and telemedicine service types. A 70% training, 10% validation, and 20% testing split were used over 10 iterations, with hyperparameter tuning performed on the validation set to identify optimal model parameters. Multiple ML approaches-random forest, XGBoost, LightGBM, and anomaly detection-were implemented in combination with undersampling and cost-sensitive learning to address class imbalance. Performance was evaluated using precision, recall, specificity, area under the curve (AUC), F1-score, and accuracy.Results: Of the models tested, undersampling with XGBoost achieved a precision of 0.115 (±0.001), recall of 0.654 (±0.005), specificity of 0.786 (±0.002), AUC of 0.720 (±0.002), and accuracy of 0.780 (±0.002). In contrast, cost-sensitive XGBoost exhibited a balanced performance with a precision of 0.123 (±0.001), recall of 0.639 (±0.006), specificity of 0.805 (±0.004), AUC of 0.722 (±0.001), and accuracy of 0.799 (±0.003). Additionally, cost-sensitive random forest achieved the highest specificity (0.843 ± 0.002) and accuracy (0.832 ± 0.001) but recorded a lower recall (0.585 ± 0.004), while cost-sensitive LightGBM and balanced random forest yielded performance metrics similar to cost-sensitive XGBoost. Isolation forest, used for abnormality detection, demonstrated the lowest performance.Conclusions: ML models can moderately predict telemedicine no-shows in Peru, with cost-sensitive boosting techniques enhancing the identification of high-risk patients. Key predictors reflect both individual behavior and system-level contexts, suggesting the need for tailored, context-specific interventions. These findings can inform targeted strategies to optimize telemedicine, improve appointment adherence, and promote equitable health care access.\",\"PeriodicalId\":94218,\"journal\":{\"name\":\"Telemedicine reports\",\"volume\":\"6 1\",\"pages\":\"109-119\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235123/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Telemedicine reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1089/tmr.2025.0009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telemedicine reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1089/tmr.2025.0009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究旨在评估机器学习（ML）模型在预测秘鲁卫生系统内远程医疗预约患者缺勤方面的性能，并确定缺勤的关键预测因素。方法：我们使用来自Teleatiendo的匿名数据（2019年6月- 2023年11月）进行了一项回顾性观察研究。该数据集包括超过150万次已完成的预约和约64,000次未赴约（4.1%），重点是远程定向和远程监控。预测变量包括患者人口统计、社会经济因素、医疗机构特征、预约时间和远程医疗服务类型。在10次迭代中使用了70%的训练，10%的验证和20%的测试分割，并在验证集上执行超参数调优以确定最佳模型参数。多种机器学习方法——随机森林、XGBoost、LightGBM和异常检测——结合欠采样和成本敏感学习来解决类不平衡问题。使用精密度、召回率、特异性、曲线下面积（AUC）、f1评分和准确性来评估其性能。结果：在测试的模型中，XGBoost的欠采样精度为0.115（±0.001），召回率为0.654（±0.005），特异性为0.786（±0.002），AUC为0.720（±0.002），准确度为0.780（±0.002）。相比之下，成本敏感的XGBoost表现出平衡的性能，精密度为0.123（±0.001），召回率为0.639（±0.006），特异性为0.805（±0.004），AUC为0.722（±0.001），准确度为0.799（±0.003）。此外，成本敏感随机森林获得了最高的特异性（0.843±0.002）和准确性（0.832±0.001），但召回率较低（0.585±0.004），而成本敏感的LightGBM和平衡随机森林的性能指标与成本敏感的XGBoost相似。用于异常检测的隔离林表现出最低的性能。结论：机器学习模型可以适度预测秘鲁远程医疗的缺席，成本敏感的促进技术增强了高风险患者的识别。关键的预测因素反映了个人行为和系统层面的情况，这表明需要量身定制的、针对具体情况的干预措施。这些发现可以为有针对性的策略提供信息，以优化远程医疗，提高预约依从性，并促进公平的卫生保健获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Machine Learning-Based Prediction of No-Show Telemedicine Encounters.

查看原文本刊更多论文

Machine Learning-Based Prediction of No-Show Telemedicine Encounters.

Aim: This study aimed to evaluate the performance of machine learning (ML) models in predicting patient no-shows for telemedicine appointments within Peruvian health system and identify key predictors of nonattendance.

Methods: We performed a retrospective observational study using anonymized data (June 2019-November 2023) from "Teleatiendo." The dataset included over 1.5 million completed appointments and about 64,000 no-shows (4.1%), focusing on teleorientation and telemonitoring. Predictor variables included patient demographics, socioeconomic factors, health care facility characteristics, appointment timing, and telemedicine service types. A 70% training, 10% validation, and 20% testing split were used over 10 iterations, with hyperparameter tuning performed on the validation set to identify optimal model parameters. Multiple ML approaches-random forest, XGBoost, LightGBM, and anomaly detection-were implemented in combination with undersampling and cost-sensitive learning to address class imbalance. Performance was evaluated using precision, recall, specificity, area under the curve (AUC), F1-score, and accuracy.

Results: Of the models tested, undersampling with XGBoost achieved a precision of 0.115 (±0.001), recall of 0.654 (±0.005), specificity of 0.786 (±0.002), AUC of 0.720 (±0.002), and accuracy of 0.780 (±0.002). In contrast, cost-sensitive XGBoost exhibited a balanced performance with a precision of 0.123 (±0.001), recall of 0.639 (±0.006), specificity of 0.805 (±0.004), AUC of 0.722 (±0.001), and accuracy of 0.799 (±0.003). Additionally, cost-sensitive random forest achieved the highest specificity (0.843 ± 0.002) and accuracy (0.832 ± 0.001) but recorded a lower recall (0.585 ± 0.004), while cost-sensitive LightGBM and balanced random forest yielded performance metrics similar to cost-sensitive XGBoost. Isolation forest, used for abnormality detection, demonstrated the lowest performance.

Conclusions: ML models can moderately predict telemedicine no-shows in Peru, with cost-sensitive boosting techniques enhancing the identification of high-risk patients. Key predictors reflect both individual behavior and system-level contexts, suggesting the need for tailored, context-specific interventions. These findings can inform targeted strategies to optimize telemedicine, improve appointment adherence, and promote equitable health care access.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Telemedicine reports

CiteScore

1.80

自引率

0.00%

发文量

审稿时长

8 weeks