Pre-trained large language models outperform statistics and machine learning forecasting visits in the emergency departments

IF 2.2 3区医学 Q1 EMERGENCY MEDICINE

American Journal of Emergency Medicine Pub Date : 2025-09-06 DOI:10.1016/j.ajem.2025.09.008

Yi-Chang Yen , Chin-Chieh Wu , Shu-Hui Chen , Kuan-Fu Chen

{"title":"Pre-trained large language models outperform statistics and machine learning forecasting visits in the emergency departments","authors":"Yi-Chang Yen , Chin-Chieh Wu , Shu-Hui Chen , Kuan-Fu Chen","doi":"10.1016/j.ajem.2025.09.008","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>The unpredictability of emergency department (ED) visits is a significant reason for ED crowding. This study aims to compare different conventional statistical learning, machine learning, and large language models (LLM) methods to forecast daily ED visits at primary, secondary, and tertiary hospitals across different regions in Taiwan, during the pre-COVID-19, COVID-19, and post-pandemic periods.</div></div><div><h3>Methods</h3><div>Daily ED visits records from 2007 to 2022, derived from the electronic medical records of six hospitals across Taiwan, were combined with calendar data and COVID-19 pandemic indicators to serve as input features. The primary objective was to develop models for a seven-day forecast horizon. Both statistical models, such as SARIMAX and Prophet, and machine learning models, including Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), DLinear, and Time-series Dense Encoder (TiDE), were developed using training datasets from 2007 to 2017. We then compared the performances of the statistical models, machine learning models, and a pre-trained transformer-based LLM on the testing set (2018–2022), which included the pre-COVID-19, COVID-19, and post-pandemic periods. We used the Mean Absolute Percentage Error (MAPE), defined as the percentage difference between the predicted and actual values, as the metric.</div></div><div><h3>Results</h3><div>A total of 7,540,271 ED visits and 31,064 data points were recorded across two tertiary, three regional, and one primary hospital. The conventional statistical models revealed a significant seven-day cycle pattern in ED visit data across the hospitals. Daily ED visits surged significantly during the COVID-19 pandemic. The pre-trained LLM demonstrated the best overall performance (MAPE 7.59, 95 % CI: 7.20–7.99), closely followed by LightGBM (MAPE 8.08, 95 % CI: 7.67–8.50). Specifically, TiDE (MAPE 5.89, 95 % CI: 5.48–6.29) and Prophet (MAPE 6.80, 95 % CI: 6.41–7.18) performed best in the pre-COVID-19 period. Although the abrupt changes during COVID-19 led to declines in the performance of all models, the pre-trained LLM and LightGBM demonstrated resilience, with MAPEs of 9.03 (95 % CI 8.32, 9.77) and 10.60 (95 % CI 9.77, 11.52), respectively.</div></div><div><h3>Conclusions</h3><div>The pre-trained LLM showed superior overall performance in forecasting ED visits, particularly during the pandemic and post-pandemic periods. LightGBM performed relatively well across all periods. Prophet and TiDE demonstrated favorable and stable performance only during the pre-pandemic period. These findings underscore the potential of advanced time series models to improve ED visit forecasts.</div></div>","PeriodicalId":55536,"journal":{"name":"American Journal of Emergency Medicine","volume":"98 ","pages":"Pages 298-308"},"PeriodicalIF":2.2000,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0735675725006199","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

The unpredictability of emergency department (ED) visits is a significant reason for ED crowding. This study aims to compare different conventional statistical learning, machine learning, and large language models (LLM) methods to forecast daily ED visits at primary, secondary, and tertiary hospitals across different regions in Taiwan, during the pre-COVID-19, COVID-19, and post-pandemic periods.

Methods

Daily ED visits records from 2007 to 2022, derived from the electronic medical records of six hospitals across Taiwan, were combined with calendar data and COVID-19 pandemic indicators to serve as input features. The primary objective was to develop models for a seven-day forecast horizon. Both statistical models, such as SARIMAX and Prophet, and machine learning models, including Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), DLinear, and Time-series Dense Encoder (TiDE), were developed using training datasets from 2007 to 2017. We then compared the performances of the statistical models, machine learning models, and a pre-trained transformer-based LLM on the testing set (2018–2022), which included the pre-COVID-19, COVID-19, and post-pandemic periods. We used the Mean Absolute Percentage Error (MAPE), defined as the percentage difference between the predicted and actual values, as the metric.

Results

A total of 7,540,271 ED visits and 31,064 data points were recorded across two tertiary, three regional, and one primary hospital. The conventional statistical models revealed a significant seven-day cycle pattern in ED visit data across the hospitals. Daily ED visits surged significantly during the COVID-19 pandemic. The pre-trained LLM demonstrated the best overall performance (MAPE 7.59, 95 % CI: 7.20–7.99), closely followed by LightGBM (MAPE 8.08, 95 % CI: 7.67–8.50). Specifically, TiDE (MAPE 5.89, 95 % CI: 5.48–6.29) and Prophet (MAPE 6.80, 95 % CI: 6.41–7.18) performed best in the pre-COVID-19 period. Although the abrupt changes during COVID-19 led to declines in the performance of all models, the pre-trained LLM and LightGBM demonstrated resilience, with MAPEs of 9.03 (95 % CI 8.32, 9.77) and 10.60 (95 % CI 9.77, 11.52), respectively.

Conclusions

The pre-trained LLM showed superior overall performance in forecasting ED visits, particularly during the pandemic and post-pandemic periods. LightGBM performed relatively well across all periods. Prophet and TiDE demonstrated favorable and stable performance only during the pre-pandemic period. These findings underscore the potential of advanced time series models to improve ED visit forecasts.

查看原文本刊更多论文

预训练的大型语言模型在预测急诊科的访问量方面优于统计和机器学习

目的急诊科（ED）就诊的不可预测性是造成急诊科拥挤的重要原因。本研究旨在比较不同的传统统计学习、机器学习和大语言模型（LLM）方法来预测台湾不同地区的一、二、三级医院在COVID-19前、COVID-19和大流行后的每日急诊科访问量。方法利用2007 - 2022年台湾省6家医院电子病历中每日急诊科就诊记录，结合日历数据和COVID-19大流行指标作为输入特征。主要目标是为7天的预测范围建立模型。统计模型，如SARIMAX和Prophet，以及机器学习模型，包括光梯度增强机（LightGBM）、长短期记忆（LSTM）、DLinear和时间序列密集编码器（TiDE），都是使用2007年至2017年的训练数据集开发的。然后，我们在测试集（2018-2022）上比较了统计模型、机器学习模型和预训练的基于变压器的LLM的性能，其中包括COVID-19前、COVID-19和大流行后的时期。我们使用平均绝对百分比误差（MAPE）作为度量，定义为预测值和实际值之间的百分比差。结果在两所三级医院、三所地区医院和一所初级医院共记录了7,540,271次急诊科就诊和31,064个数据点。传统统计模型揭示了各医院急诊科就诊数据的显著7天周期模式。在2019冠状病毒病大流行期间，每日急诊室就诊人数大幅增加。预训练LLM表现出最佳的整体性能（MAPE 7.59, 95% CI: 7.20-7.99），紧随其后的是LightGBM （MAPE 8.08, 95% CI: 7.67-8.50）。具体而言，TiDE （MAPE 5.89, 95% CI: 5.48-6.29）和Prophet （MAPE 6.80, 95% CI: 6.41-7.18）在covid -19前期表现最佳。尽管COVID-19期间的突变导致所有模型的性能下降，但预训练的LLM和LightGBM显示出弹性，MAPEs分别为9.03 （95% CI 8.32, 9.77）和10.60 （95% CI 9.77, 11.52）。结论预先训练的LLM在预测急诊科就诊方面表现优异，特别是在大流行和大流行后时期。LightGBM在所有时期的表现都相对较好。Prophet和TiDE仅在大流行前表现出良好和稳定的表现。这些发现强调了先进的时间序列模型在改善急诊科就诊预测方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

American Journal of Emergency Medicine 医学-急救医学

CiteScore

6.00

自引率

5.60%

发文量

730

审稿时长

42 days

期刊介绍： A distinctive blend of practicality and scholarliness makes the American Journal of Emergency Medicine a key source for information on emergency medical care. Covering all activities concerned with emergency medicine, it is the journal to turn to for information to help increase the ability to understand, recognize and treat emergency conditions. Issues contain clinical articles, case reports, review articles, editorials, international notes, book reviews and more.