Brian H Park, Chun-Nan Hsu, Austin Nguyen, Ying Q Zhou, Rodney A Gabriel
{"title":"Improving postoperative length of stay forecasting with retrieval-augmented prediction.","authors":"Brian H Park, Chun-Nan Hsu, Austin Nguyen, Ying Q Zhou, Rodney A Gabriel","doi":"10.1093/jamia/ocaf154","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The objective of this study is to evaluate retrieval-augmented prediction for forecasting hospital length of stay (LOS) following surgery compared to traditional machine learning (ML), standalone large language models (LLMs), and retrieval-augmented generation (RAG) approaches.</p><p><strong>Materials and methods: </strong>Spine surgery cases were extracted from electronic health records. Structured features and operative notes were concatenated into natural language patient representations, embedded using Sentence-Bidirectional Encoder Representations from Transformer, and stored in a vector database. Eight predictive models were implemented, including a baseline model, standalone ML with embeddings, standalone LLM (Gemma 3:27B), and combinations of these with retrieval-augmented prediction or generation. The retrieval-augmented prediction model computed a similarity-weighted average LOS from nearest neighbors. Performance was assessed using R2, mean absolute value (MAE), and root mean square error (RMSE).</p><p><strong>Results: </strong>Retrieval-augmented prediction alone outperformed standalone ML and LLM models (R2 = 0.39, MAE = 4.47). Combining ML or LLM outputs with retrieval-augmented prediction further improved performance. The best performing model was a neural network blended with retrieval-augmented prediction (R2 = 0.52, MAE = 4.16). LLM-RAG alone reached R2 = 0.19, which improved to 0.47 when combined with retrieval-augmented predictions. Retrieval-augmented prediction consistently reduced MAE and RMSE by up to 32% and 38%, respectively.</p><p><strong>Discussion: </strong>Retrieval-augmented prediction offers interpretable and resource-efficient forecasting by semantically leveraging prior patient cases without generative modeling. It consistently outperformed RAG and ML across metrics, approximating clinical reasoning via similarity-based inference.</p><p><strong>Conclusion: </strong>Retrieval-augmented prediction significantly enhances LOS prediction accuracy over standard ML and LLM models. Its interpretability and scalability make it a promising solution for integrating predictive analytics into clinical workflows.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf154","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: The objective of this study is to evaluate retrieval-augmented prediction for forecasting hospital length of stay (LOS) following surgery compared to traditional machine learning (ML), standalone large language models (LLMs), and retrieval-augmented generation (RAG) approaches.
Materials and methods: Spine surgery cases were extracted from electronic health records. Structured features and operative notes were concatenated into natural language patient representations, embedded using Sentence-Bidirectional Encoder Representations from Transformer, and stored in a vector database. Eight predictive models were implemented, including a baseline model, standalone ML with embeddings, standalone LLM (Gemma 3:27B), and combinations of these with retrieval-augmented prediction or generation. The retrieval-augmented prediction model computed a similarity-weighted average LOS from nearest neighbors. Performance was assessed using R2, mean absolute value (MAE), and root mean square error (RMSE).
Results: Retrieval-augmented prediction alone outperformed standalone ML and LLM models (R2 = 0.39, MAE = 4.47). Combining ML or LLM outputs with retrieval-augmented prediction further improved performance. The best performing model was a neural network blended with retrieval-augmented prediction (R2 = 0.52, MAE = 4.16). LLM-RAG alone reached R2 = 0.19, which improved to 0.47 when combined with retrieval-augmented predictions. Retrieval-augmented prediction consistently reduced MAE and RMSE by up to 32% and 38%, respectively.
Discussion: Retrieval-augmented prediction offers interpretable and resource-efficient forecasting by semantically leveraging prior patient cases without generative modeling. It consistently outperformed RAG and ML across metrics, approximating clinical reasoning via similarity-based inference.
Conclusion: Retrieval-augmented prediction significantly enhances LOS prediction accuracy over standard ML and LLM models. Its interpretability and scalability make it a promising solution for integrating predictive analytics into clinical workflows.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.