{"title":"From Conversation to Standardized Terminology: An LLM-RAG Approach for Automated Health Problem Identification in Home Healthcare.","authors":"Zhihong Zhang, Pallavi Gupta, Jiyoun Song, Maryam Zolnoori, Maxim Topaz","doi":"10.1111/jnu.70039","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>With ambient listening systems increasingly adopted in healthcare, analyzing clinician-patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time-consuming and labor-intensive. This study aims to automate health problem identification from clinician-patient conversations using large language models (LLMs) with retrieval-augmented generation (RAG).</p><p><strong>Methods: </strong>Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician-patient encounters in home healthcare. RAG-enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero-shot, few-shot, and chain-of-thought). Three LLMs-Llama 3.1-8B-Instruct, GPT-4o-mini, and GPT-o3-mini-were compared using precision, recall, and F1-score against expert annotations.</p><p><strong>Results: </strong>The optimal configuration used a 1-utterance context window, top k = 15, top p = 0.6, and few-shot learning with chain-of-thought prompting. GPT-4o-mini achieved the highest F1-score (0.90) for both problem and sign/symptom identification, followed by GPT-o3-mini (0.83/0.82), while Llama 3.1-8B-Instruct performed worst (0.73/0.72).</p><p><strong>Conclusions: </strong>Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery.</p><p><strong>Clinical relevance: </strong>Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.</p>","PeriodicalId":51091,"journal":{"name":"Journal of Nursing Scholarship","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nursing Scholarship","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jnu.70039","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: With ambient listening systems increasingly adopted in healthcare, analyzing clinician-patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time-consuming and labor-intensive. This study aims to automate health problem identification from clinician-patient conversations using large language models (LLMs) with retrieval-augmented generation (RAG).
Methods: Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician-patient encounters in home healthcare. RAG-enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero-shot, few-shot, and chain-of-thought). Three LLMs-Llama 3.1-8B-Instruct, GPT-4o-mini, and GPT-o3-mini-were compared using precision, recall, and F1-score against expert annotations.
Results: The optimal configuration used a 1-utterance context window, top k = 15, top p = 0.6, and few-shot learning with chain-of-thought prompting. GPT-4o-mini achieved the highest F1-score (0.90) for both problem and sign/symptom identification, followed by GPT-o3-mini (0.83/0.82), while Llama 3.1-8B-Instruct performed worst (0.73/0.72).
Conclusions: Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery.
Clinical relevance: Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.
期刊介绍:
This widely read and respected journal features peer-reviewed, thought-provoking articles representing research by some of the world’s leading nurse researchers.
Reaching health professionals, faculty and students in 103 countries, the Journal of Nursing Scholarship is focused on health of people throughout the world. It is the official journal of Sigma Theta Tau International and it reflects the society’s dedication to providing the tools necessary to improve nursing care around the world.