Improving postoperative length of stay forecasting with retrieval-augmented prediction.

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2025-09-18 DOI:10.1093/jamia/ocaf154

Brian H Park, Chun-Nan Hsu, Austin Nguyen, Ying Q Zhou, Rodney A Gabriel

{"title":"Improving postoperative length of stay forecasting with retrieval-augmented prediction.","authors":"Brian H Park, Chun-Nan Hsu, Austin Nguyen, Ying Q Zhou, Rodney A Gabriel","doi":"10.1093/jamia/ocaf154","DOIUrl":null,"url":null,"abstract":"Objective: The objective of this study is to evaluate retrieval-augmented prediction for forecasting hospital length of stay (LOS) following surgery compared to traditional machine learning (ML), standalone large language models (LLMs), and retrieval-augmented generation (RAG) approaches.Materials and methods: Spine surgery cases were extracted from electronic health records. Structured features and operative notes were concatenated into natural language patient representations, embedded using Sentence-Bidirectional Encoder Representations from Transformer, and stored in a vector database. Eight predictive models were implemented, including a baseline model, standalone ML with embeddings, standalone LLM (Gemma 3:27B), and combinations of these with retrieval-augmented prediction or generation. The retrieval-augmented prediction model computed a similarity-weighted average LOS from nearest neighbors. Performance was assessed using R2, mean absolute value (MAE), and root mean square error (RMSE).Results: Retrieval-augmented prediction alone outperformed standalone ML and LLM models (R2 = 0.39, MAE = 4.47). Combining ML or LLM outputs with retrieval-augmented prediction further improved performance. The best performing model was a neural network blended with retrieval-augmented prediction (R2 = 0.52, MAE = 4.16). LLM-RAG alone reached R2 = 0.19, which improved to 0.47 when combined with retrieval-augmented predictions. Retrieval-augmented prediction consistently reduced MAE and RMSE by up to 32% and 38%, respectively.Discussion: Retrieval-augmented prediction offers interpretable and resource-efficient forecasting by semantically leveraging prior patient cases without generative modeling. It consistently outperformed RAG and ML across metrics, approximating clinical reasoning via similarity-based inference.Conclusion: Retrieval-augmented prediction significantly enhances LOS prediction accuracy over standard ML and LLM models. Its interpretability and scalability make it a promising solution for integrating predictive analytics into clinical workflows.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf154","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: The objective of this study is to evaluate retrieval-augmented prediction for forecasting hospital length of stay (LOS) following surgery compared to traditional machine learning (ML), standalone large language models (LLMs), and retrieval-augmented generation (RAG) approaches.

Materials and methods: Spine surgery cases were extracted from electronic health records. Structured features and operative notes were concatenated into natural language patient representations, embedded using Sentence-Bidirectional Encoder Representations from Transformer, and stored in a vector database. Eight predictive models were implemented, including a baseline model, standalone ML with embeddings, standalone LLM (Gemma 3:27B), and combinations of these with retrieval-augmented prediction or generation. The retrieval-augmented prediction model computed a similarity-weighted average LOS from nearest neighbors. Performance was assessed using R2, mean absolute value (MAE), and root mean square error (RMSE).

Results: Retrieval-augmented prediction alone outperformed standalone ML and LLM models (R2 = 0.39, MAE = 4.47). Combining ML or LLM outputs with retrieval-augmented prediction further improved performance. The best performing model was a neural network blended with retrieval-augmented prediction (R2 = 0.52, MAE = 4.16). LLM-RAG alone reached R2 = 0.19, which improved to 0.47 when combined with retrieval-augmented predictions. Retrieval-augmented prediction consistently reduced MAE and RMSE by up to 32% and 38%, respectively.

Discussion: Retrieval-augmented prediction offers interpretable and resource-efficient forecasting by semantically leveraging prior patient cases without generative modeling. It consistently outperformed RAG and ML across metrics, approximating clinical reasoning via similarity-based inference.

Conclusion: Retrieval-augmented prediction significantly enhances LOS prediction accuracy over standard ML and LLM models. Its interpretability and scalability make it a promising solution for integrating predictive analytics into clinical workflows.

查看原文本刊更多论文

利用检索增强预测改进术后住院时间预测。

目的：本研究的目的是评估与传统机器学习（ML）、独立大语言模型（LLMs）和检索增强生成（RAG）方法相比，检索增强预测预测手术后住院时间（LOS）的效果。材料与方法：从电子病历中提取脊柱外科病例。结构化特征和手术笔记被连接到自然语言患者表示中，使用Transformer的句子双向编码器表示进行嵌入，并存储在矢量数据库中。实现了8个预测模型，包括基线模型、带有嵌入的独立ML、独立LLM (Gemma 3:27B)，以及这些模型与检索增强预测或生成的组合。检索增强预测模型计算了最近邻的相似加权平均LOS。使用R2、平均绝对值（MAE）和均方根误差（RMSE）评估性能。结果：单独的检索增强预测优于独立的ML和LLM模型（R2 = 0.39, MAE = 4.47）。将ML或LLM输出与检索增强预测相结合进一步提高了性能。表现最好的模型是神经网络混合检索增强预测（R2 = 0.52, MAE = 4.16）。LLM-RAG单独达到R2 = 0.19，与检索增强预测相结合时提高到0.47。检索增强预测持续降低MAE和RMSE分别高达32%和38%。讨论：检索增强预测提供了可解释的和资源高效的预测，通过语义利用先前的患者病例，而不需要生成建模。它在指标上始终优于RAG和ML，通过基于相似性的推理近似临床推理。结论：检索增强预测显著提高了标准ML和LLM模型的LOS预测精度。它的可解释性和可扩展性使其成为将预测分析集成到临床工作流程中的有前途的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.