Keyword-optimized template insertion for clinical note classification via prompt-based learning.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-07-03 DOI:10.1186/s12911-025-03071-y

Eugenia Alleva, Isotta Landi, Leslee J Shaw, Erwin Böttinger, Ipek Ensari, Thomas J Fuchs

{"title":"Keyword-optimized template insertion for clinical note classification via prompt-based learning.","authors":"Eugenia Alleva, Isotta Landi, Leslee J Shaw, Erwin Böttinger, Ipek Ensari, Thomas J Fuchs","doi":"10.1186/s12911-025-03071-y","DOIUrl":null,"url":null,"abstract":"Background: Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotated data is limited. This study aims to investigate the impact of template position on model performance and training efficiency in clinical note classification tasks using prompt-based learning, especially in zero- and few-shot settings.Methods: We developed a keyword-optimized template insertion method (KOTI) to enhance model performance by strategically placing prompt templates near relevant clinical information within the notes. The method involves defining task-specific keywords, identifying sentences containing these keywords, and inserting the prompt template in their vicinity. We compared KOTI with standard template insertion (STI) methods in which the template is directly appended at the end of the input text. Specifically, we compared STI with naïve tail-truncation (STI-s) and STI with keyword-optimized input truncation (STI-k). Experiments were conducted using two pre-trained encoder models, GatorTron and ClinicalBERT, and two decoder models, BioGPT and ClinicalT5, across five classification tasks, including dysmenorrhea, peripheral vascular disease, depression, osteoarthritis, and smoking status classification.Results: Our experiments revealed that the KOTI approach consistently outperformed both STI-s and STI-k in zero-shot and few-shot scenarios for encoder models, with KOTI yielding a significant 24% F1 improvement over STI-k for GatorTron and 8% for Clinical BERT. Additionally, training with balanced examples further enhanced performance, particularly under few-shot conditions. In contrast, decoder-based models exhibited inconsistent results, with KOTI showing significant improvement in F1 score over STI-k for BioGPT (+19%), but a significant drop for ClinicalT5 (-18%), suggesting that KOTI is not beneficial across all transformer model architectures.Conclusion: Our findings underscore the significance of template position in prompt-based fine-tuning of encoder models and highlights KOTI's potential to optimize real-world clinical note classification tasks with few training examples.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"247"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12224782/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03071-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotated data is limited. This study aims to investigate the impact of template position on model performance and training efficiency in clinical note classification tasks using prompt-based learning, especially in zero- and few-shot settings.

Methods: We developed a keyword-optimized template insertion method (KOTI) to enhance model performance by strategically placing prompt templates near relevant clinical information within the notes. The method involves defining task-specific keywords, identifying sentences containing these keywords, and inserting the prompt template in their vicinity. We compared KOTI with standard template insertion (STI) methods in which the template is directly appended at the end of the input text. Specifically, we compared STI with naïve tail-truncation (STI-s) and STI with keyword-optimized input truncation (STI-k). Experiments were conducted using two pre-trained encoder models, GatorTron and ClinicalBERT, and two decoder models, BioGPT and ClinicalT5, across five classification tasks, including dysmenorrhea, peripheral vascular disease, depression, osteoarthritis, and smoking status classification.

Results: Our experiments revealed that the KOTI approach consistently outperformed both STI-s and STI-k in zero-shot and few-shot scenarios for encoder models, with KOTI yielding a significant 24% F1 improvement over STI-k for GatorTron and 8% for Clinical BERT. Additionally, training with balanced examples further enhanced performance, particularly under few-shot conditions. In contrast, decoder-based models exhibited inconsistent results, with KOTI showing significant improvement in F1 score over STI-k for BioGPT (+19%), but a significant drop for ClinicalT5 (-18%), suggesting that KOTI is not beneficial across all transformer model architectures.

Conclusion: Our findings underscore the significance of template position in prompt-based fine-tuning of encoder models and highlights KOTI's potential to optimize real-world clinical note classification tasks with few training examples.

查看原文本刊更多论文

基于提示学习的临床笔记分类关键字优化模板插入。

背景：基于提示的学习涉及到在预先训练好的大型语言模型（PLMs）的输入中添加提示（即模板），以使它们以最少的训练适应特定的任务。这种技术在临床场景中是特别有利的，其中注释数据的数量是有限的。本研究旨在探讨模板位置对使用基于提示学习的临床笔记分类任务中模型性能和训练效率的影响，特别是在零次和少次设置中。方法：我们开发了一种关键字优化模板插入方法（KOTI），通过在笔记中相关临床信息附近策略性地放置提示模板来提高模型性能。该方法包括定义特定于任务的关键字，识别包含这些关键字的句子，并在其附近插入提示模板。我们将KOTI与标准模板插入（STI）方法进行了比较，后者将模板直接附加在输入文本的末尾。具体来说，我们比较了STI与naïve尾部截断（STI-s）和STI与关键词优化输入截断（STI-k）。实验使用两个预训练的编码器模型GatorTron和ClinicalBERT，以及两个解码器模型BioGPT和ClinicalT5，跨越痛经、外周血管疾病、抑郁症、骨关节炎和吸烟状况分类等5个分类任务。结果：我们的实验表明，KOTI方法在编码器模型的零射击和少射击场景中始终优于STI-s和STI-k， KOTI方法在GatorTron上比STI-k的F1提高24%，在临床BERT上比STI-k提高8%。此外，平衡样例的训练进一步提高了表现，特别是在少数投篮条件下。相比之下，基于解码器的模型显示出不一致的结果，KOTI显示BioGPT的F1分数比STI-k有显著提高（+19%），但ClinicalT5的F1分数显著下降（-18%），这表明KOTI并不是对所有变压器模型架构都有益。结论：我们的研究结果强调了模板位置在基于提示的编码器模型微调中的重要性，并强调了KOTI在使用少量训练样本优化现实世界临床笔记分类任务方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.