从对话到标准化术语：家庭医疗保健中自动健康问题识别的LLM-RAG方法。

IF 2.9 3区医学 Q1 NURSING

Journal of Nursing Scholarship Pub Date : 2025-08-10 DOI:10.1111/jnu.70039

Zhihong Zhang, Pallavi Gupta, Jiyoun Song, Maryam Zolnoori, Maxim Topaz

{"title":"从对话到标准化术语：家庭医疗保健中自动健康问题识别的LLM-RAG方法。","authors":"Zhihong Zhang, Pallavi Gupta, Jiyoun Song, Maryam Zolnoori, Maxim Topaz","doi":"10.1111/jnu.70039","DOIUrl":null,"url":null,"abstract":"Background: With ambient listening systems increasingly adopted in healthcare, analyzing clinician-patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time-consuming and labor-intensive. This study aims to automate health problem identification from clinician-patient conversations using large language models (LLMs) with retrieval-augmented generation (RAG).Methods: Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician-patient encounters in home healthcare. RAG-enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero-shot, few-shot, and chain-of-thought). Three LLMs-Llama 3.1-8B-Instruct, GPT-4o-mini, and GPT-o3-mini-were compared using precision, recall, and F1-score against expert annotations.Results: The optimal configuration used a 1-utterance context window, top k = 15, top p = 0.6, and few-shot learning with chain-of-thought prompting. GPT-4o-mini achieved the highest F1-score (0.90) for both problem and sign/symptom identification, followed by GPT-o3-mini (0.83/0.82), while Llama 3.1-8B-Instruct performed worst (0.73/0.72).Conclusions: Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery.Clinical relevance: Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.","PeriodicalId":51091,"journal":{"name":"Journal of Nursing Scholarship","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Conversation to Standardized Terminology: An LLM-RAG Approach for Automated Health Problem Identification in Home Healthcare.\",\"authors\":\"Zhihong Zhang, Pallavi Gupta, Jiyoun Song, Maryam Zolnoori, Maxim Topaz\",\"doi\":\"10.1111/jnu.70039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: With ambient listening systems increasingly adopted in healthcare, analyzing clinician-patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time-consuming and labor-intensive. This study aims to automate health problem identification from clinician-patient conversations using large language models (LLMs) with retrieval-augmented generation (RAG).Methods: Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician-patient encounters in home healthcare. RAG-enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero-shot, few-shot, and chain-of-thought). Three LLMs-Llama 3.1-8B-Instruct, GPT-4o-mini, and GPT-o3-mini-were compared using precision, recall, and F1-score against expert annotations.Results: The optimal configuration used a 1-utterance context window, top k = 15, top p = 0.6, and few-shot learning with chain-of-thought prompting. GPT-4o-mini achieved the highest F1-score (0.90) for both problem and sign/symptom identification, followed by GPT-o3-mini (0.83/0.82), while Llama 3.1-8B-Instruct performed worst (0.73/0.72).Conclusions: Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery.Clinical relevance: Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.\",\"PeriodicalId\":51091,\"journal\":{\"name\":\"Journal of Nursing Scholarship\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Nursing Scholarship\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/jnu.70039\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"NURSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nursing Scholarship","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/jnu.70039","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NURSING","Score":null,"Total":0}

引用次数: 0

摘要

背景：随着环境听力系统越来越多地应用于医疗保健，分析临床医生与患者的对话变得至关重要。奥马哈系统是用于记录患者护理的标准化术语，将健康问题分为四个领域，涉及42个问题和377个体征/症状。手动识别和映射这些问题既耗时又费力。本研究旨在使用检索增强生成（RAG）的大型语言模型（llm）从临床-患者对话中自动识别健康问题。方法：采用奥马哈系统框架，对22例家庭医疗中临床患者的5118次话语进行分析。rag增强的llm检测健康问题并将其映射到Omaha系统术语。我们评估了不同的模型配置，包括嵌入模型、上下文窗口大小、参数设置（top k、top p）和提示策略（零射击、少射击和思维链）。对三种llms——llama 3.1- 8b - instruction、gpt - 40 -mini和gpt - 03 -mini进行了精度、召回率和专家注释的f1分数的比较。结果：最优配置使用1个话语上下文窗口，top k = 15, top p = 0.6，以及使用思维链提示的少镜头学习。在问题和体征/症状识别方面，gpt - 40 -mini得分最高（0.90），其次是gpt - 03 -mini(0.83/0.82)，而Llama 3.1-8B-Instruct得分最差（0.73/0.72）。结论：使用Omaha系统，具有RAG的法学硕士在临床对话中有效地自动识别健康问题。这种方法可以提高文档的完整性，减少文档负担，并有可能通过更全面的问题识别来改善患者的结果，转化为临床效率和护理交付方面的切实改进。临床相关性：从临床对话中自动识别健康问题可以提高文档的准确性，减轻负担，并确保与奥马哈系统等标准化框架保持一致，从而提高家庭医疗保健的护理质量和连续性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From Conversation to Standardized Terminology: An LLM-RAG Approach for Automated Health Problem Identification in Home Healthcare.

Background: With ambient listening systems increasingly adopted in healthcare, analyzing clinician-patient conversations has become essential. The Omaha System is a standardized terminology for documenting patient care, classifying health problems into four domains across 42 problems and 377 signs/symptoms. Manually identifying and mapping these problems is time-consuming and labor-intensive. This study aims to automate health problem identification from clinician-patient conversations using large language models (LLMs) with retrieval-augmented generation (RAG).

Methods: Using the Omaha System framework, we analyzed 5118 utterances from 22 clinician-patient encounters in home healthcare. RAG-enhanced LLMs detected health problems and mapped them to Omaha System terminology. We evaluated different model configurations, including embedding models, context window sizes, parameter settings (top k, top p), and prompting strategies (zero-shot, few-shot, and chain-of-thought). Three LLMs-Llama 3.1-8B-Instruct, GPT-4o-mini, and GPT-o3-mini-were compared using precision, recall, and F1-score against expert annotations.

Results: The optimal configuration used a 1-utterance context window, top k = 15, top p = 0.6, and few-shot learning with chain-of-thought prompting. GPT-4o-mini achieved the highest F1-score (0.90) for both problem and sign/symptom identification, followed by GPT-o3-mini (0.83/0.82), while Llama 3.1-8B-Instruct performed worst (0.73/0.72).

Conclusions: Using the Omaha System, LLMs with RAG effectively automate health problem identification in clinical conversations. This approach can enhance documentation completeness, reduce documentation burden, and potentially improve patient outcomes through more comprehensive problem identification, translating into tangible improvements in clinical efficiency and care delivery.

Clinical relevance: Automating health problem identification from clinical conversations can improve documentation accuracy, reduce burden, and ensure alignment with standardized frameworks like the Omaha System, enhancing care quality and continuity in home healthcare.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Nursing Scholarship 医学-护理

CiteScore

6.30

自引率

5.90%

发文量

审稿时长

6-12 weeks

期刊介绍： This widely read and respected journal features peer-reviewed, thought-provoking articles representing research by some of the world’s leading nurse researchers. Reaching health professionals, faculty and students in 103 countries, the Journal of Nursing Scholarship is focused on health of people throughout the world. It is the official journal of Sigma Theta Tau International and it reflects the society’s dedication to providing the tools necessary to improve nursing care around the world.