Automatic prompt design via particle swarm optimization driven LLM for efficient medical information extraction

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2025-04-17 DOI:10.1016/j.swevo.2025.101922

Tian Zhang , Lianbo Ma , Shi Cheng , Yikai Liu , Nan Li , Hongjiang Wang

{"title":"Automatic prompt design via particle swarm optimization driven LLM for efficient medical information extraction","authors":"Tian Zhang , Lianbo Ma , Shi Cheng , Yikai Liu , Nan Li , Hongjiang Wang","doi":"10.1016/j.swevo.2025.101922","DOIUrl":null,"url":null,"abstract":"<div><div>Medical information extraction (IE) is an essential aspect of electronic health records (EHRs), but it is a challenging task that converts plain text into structured knowledge, where domain models struggle to achieve performance. Recently, large language models (LLMs), which have demonstrated remarkable capabilities in text understanding and generation, have emerged as a promising method for handling natural language texts. However, LLMs are too dependent on elaborate prompts, resulting in extensive expert knowledge and manual prompt templates needed. In this work, we propose a novel method for the automatic prompt design, called <strong>P</strong>article <strong>S</strong>warm <strong>O</strong>ptimization-based <strong>P</strong>rompt using a <strong>L</strong>arge language model (<strong>PSOPL</strong>). As an efficient method for medical information extraction from EHRs, PSOPL can allow particle swarm optimization (PSO) to automate design prompts by leveraging LLM’s ability to generate coherent text token-by-token. Specifically, starting with a small number of initial prompts, evolutionary operators in PSOPL guide the LLM to generate new candidate prompts iteratively, and the PSOPL evaluates population fitness to retain the optimal prompts. In this way, PSOPL can achieve prompt evolution without model training and reduce the human effort and requirement for domain knowledge. We conducted experiments for open-source LLMs (e.g., Alpaca-7B, GPT-J-6B) and closed-source LLM (e.g., GLM-4), on public medical datasets (e.g., CMeEE, CMeIE, CHIP-CDEE) covering information extraction tasks (e.g., named Entity recognition, relation extraction, event extraction) to verify the method’s generalizability. The experimental results demonstrate the potential of using PSO-based LLMs to design prompts automatically, allowing for the swift extraction of important information about patients in the EHRs.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"95 ","pages":"Article 101922"},"PeriodicalIF":8.2000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221065022500080X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Medical information extraction (IE) is an essential aspect of electronic health records (EHRs), but it is a challenging task that converts plain text into structured knowledge, where domain models struggle to achieve performance. Recently, large language models (LLMs), which have demonstrated remarkable capabilities in text understanding and generation, have emerged as a promising method for handling natural language texts. However, LLMs are too dependent on elaborate prompts, resulting in extensive expert knowledge and manual prompt templates needed. In this work, we propose a novel method for the automatic prompt design, called Particle Swarm Optimization-based Prompt using a Large language model (PSOPL). As an efficient method for medical information extraction from EHRs, PSOPL can allow particle swarm optimization (PSO) to automate design prompts by leveraging LLM’s ability to generate coherent text token-by-token. Specifically, starting with a small number of initial prompts, evolutionary operators in PSOPL guide the LLM to generate new candidate prompts iteratively, and the PSOPL evaluates population fitness to retain the optimal prompts. In this way, PSOPL can achieve prompt evolution without model training and reduce the human effort and requirement for domain knowledge. We conducted experiments for open-source LLMs (e.g., Alpaca-7B, GPT-J-6B) and closed-source LLM (e.g., GLM-4), on public medical datasets (e.g., CMeEE, CMeIE, CHIP-CDEE) covering information extraction tasks (e.g., named Entity recognition, relation extraction, event extraction) to verify the method’s generalizability. The experimental results demonstrate the potential of using PSO-based LLMs to design prompts automatically, allowing for the swift extraction of important information about patients in the EHRs.

查看原文本刊更多论文

基于粒子群优化驱动的LLM自动提示设计，实现高效的医学信息提取

医疗信息提取（IE）是电子健康记录（EHRs）的一个重要方面，但是将纯文本转换为结构化知识是一项具有挑战性的任务，领域模型难以实现性能。最近，大型语言模型（llm）在文本理解和生成方面表现出了显著的能力，已经成为处理自然语言文本的一种很有前途的方法。然而，法学硕士过于依赖于精心设计的提示，导致需要大量的专家知识和手动提示模板。在这项工作中，我们提出了一种新的自动提示设计方法，称为基于粒子群优化的基于大语言模型（PSOPL）的提示。作为一种从电子病历中提取医疗信息的有效方法，PSOPL可以允许粒子群优化（PSO）通过利用LLM的逐个令牌生成连贯文本的能力来自动化设计提示。具体来说，PSOPL中的进化算子从少量初始提示开始，引导LLM迭代生成新的候选提示，PSOPL评估种群适应度以保留最优提示。这样，PSOPL可以在不进行模型训练的情况下实现快速进化，减少了人类对领域知识的投入和需求。我们在公共医疗数据集（如CMeEE、CMeIE、CHIP-CDEE）上对开源LLM（如Alpaca-7B、GPT-J-6B）和闭源LLM（如GLM-4）进行了涵盖信息提取任务（如命名实体识别、关系提取、事件提取）的实验，以验证该方法的泛化性。实验结果证明了使用基于pso的llm自动设计提示的潜力，允许在电子病历中快速提取有关患者的重要信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.