Kurt Miller, Steven Bedrick, Qiuhao Lu, Andrew Wen, William Hersh, Kirk Roberts, Hongfang Liu
{"title":"Dynamic few-shot prompting for clinical note section classification using lightweight, open-source large language models.","authors":"Kurt Miller, Steven Bedrick, Qiuhao Lu, Andrew Wen, William Hersh, Kirk Roberts, Hongfang Liu","doi":"10.1093/jamia/ocaf084","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Unlocking clinical information embedded in clinical notes has been hindered to a significant degree by domain-specific and context-sensitive language. Identification of note sections and structural document elements has been shown to improve information extraction and dependent downstream clinical natural language processing (NLP) tasks and applications. This study investigates the viability of a dynamic example selection prompting method to section classification using lightweight, open-source large language models (LLMs) as a practical solution for real-world healthcare clinical NLP systems.</p><p><strong>Materials and methods: </strong>We develop a dynamic few-shot prompting approach to classifying sections where section samples are first embedded using a transformer-based model and deposited in a vector store. During inference, the embedded samples with the most similar contextual embeddings to a given input section text are retrieved from the vector store and inserted into the LLM prompt. We evaluate this technique on two datasets comprising two section schemas, including varying levels of context. We compare the performance to baseline zero-shot and randomly selected few-shot scenarios.</p><p><strong>Results: </strong>The dynamic few-shot prompting experiments yielded the highest F1 scores in each of the classification tasks and datasets for all seven of the LLMs included in the evaluation, averaging a macro F1 increase of 39.3% and 21.1% in our primary section classification task over the zero-shot and static few-shot baselines, respectively.</p><p><strong>Discussion and conclusion: </strong>Our results showcase substantial performance improvements imparted by dynamically selecting examples for few-shot LLM prompting, and further improvement by including section context, demonstrating compelling potential for clinical applications.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocaf084","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Unlocking clinical information embedded in clinical notes has been hindered to a significant degree by domain-specific and context-sensitive language. Identification of note sections and structural document elements has been shown to improve information extraction and dependent downstream clinical natural language processing (NLP) tasks and applications. This study investigates the viability of a dynamic example selection prompting method to section classification using lightweight, open-source large language models (LLMs) as a practical solution for real-world healthcare clinical NLP systems.
Materials and methods: We develop a dynamic few-shot prompting approach to classifying sections where section samples are first embedded using a transformer-based model and deposited in a vector store. During inference, the embedded samples with the most similar contextual embeddings to a given input section text are retrieved from the vector store and inserted into the LLM prompt. We evaluate this technique on two datasets comprising two section schemas, including varying levels of context. We compare the performance to baseline zero-shot and randomly selected few-shot scenarios.
Results: The dynamic few-shot prompting experiments yielded the highest F1 scores in each of the classification tasks and datasets for all seven of the LLMs included in the evaluation, averaging a macro F1 increase of 39.3% and 21.1% in our primary section classification task over the zero-shot and static few-shot baselines, respectively.
Discussion and conclusion: Our results showcase substantial performance improvements imparted by dynamically selecting examples for few-shot LLM prompting, and further improvement by including section context, demonstrating compelling potential for clinical applications.
期刊介绍:
JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.