Tian Zhang , Lianbo Ma , Shi Cheng , Yikai Liu , Nan Li , Hongjiang Wang
{"title":"Automatic prompt design via particle swarm optimization driven LLM for efficient medical information extraction","authors":"Tian Zhang , Lianbo Ma , Shi Cheng , Yikai Liu , Nan Li , Hongjiang Wang","doi":"10.1016/j.swevo.2025.101922","DOIUrl":null,"url":null,"abstract":"<div><div>Medical information extraction (IE) is an essential aspect of electronic health records (EHRs), but it is a challenging task that converts plain text into structured knowledge, where domain models struggle to achieve performance. Recently, large language models (LLMs), which have demonstrated remarkable capabilities in text understanding and generation, have emerged as a promising method for handling natural language texts. However, LLMs are too dependent on elaborate prompts, resulting in extensive expert knowledge and manual prompt templates needed. In this work, we propose a novel method for the automatic prompt design, called <strong>P</strong>article <strong>S</strong>warm <strong>O</strong>ptimization-based <strong>P</strong>rompt using a <strong>L</strong>arge language model (<strong>PSOPL</strong>). As an efficient method for medical information extraction from EHRs, PSOPL can allow particle swarm optimization (PSO) to automate design prompts by leveraging LLM’s ability to generate coherent text token-by-token. Specifically, starting with a small number of initial prompts, evolutionary operators in PSOPL guide the LLM to generate new candidate prompts iteratively, and the PSOPL evaluates population fitness to retain the optimal prompts. In this way, PSOPL can achieve prompt evolution without model training and reduce the human effort and requirement for domain knowledge. We conducted experiments for open-source LLMs (e.g., Alpaca-7B, GPT-J-6B) and closed-source LLM (e.g., GLM-4), on public medical datasets (e.g., CMeEE, CMeIE, CHIP-CDEE) covering information extraction tasks (e.g., named Entity recognition, relation extraction, event extraction) to verify the method’s generalizability. The experimental results demonstrate the potential of using PSO-based LLMs to design prompts automatically, allowing for the swift extraction of important information about patients in the EHRs.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"95 ","pages":"Article 101922"},"PeriodicalIF":8.2000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221065022500080X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Medical information extraction (IE) is an essential aspect of electronic health records (EHRs), but it is a challenging task that converts plain text into structured knowledge, where domain models struggle to achieve performance. Recently, large language models (LLMs), which have demonstrated remarkable capabilities in text understanding and generation, have emerged as a promising method for handling natural language texts. However, LLMs are too dependent on elaborate prompts, resulting in extensive expert knowledge and manual prompt templates needed. In this work, we propose a novel method for the automatic prompt design, called Particle Swarm Optimization-based Prompt using a Large language model (PSOPL). As an efficient method for medical information extraction from EHRs, PSOPL can allow particle swarm optimization (PSO) to automate design prompts by leveraging LLM’s ability to generate coherent text token-by-token. Specifically, starting with a small number of initial prompts, evolutionary operators in PSOPL guide the LLM to generate new candidate prompts iteratively, and the PSOPL evaluates population fitness to retain the optimal prompts. In this way, PSOPL can achieve prompt evolution without model training and reduce the human effort and requirement for domain knowledge. We conducted experiments for open-source LLMs (e.g., Alpaca-7B, GPT-J-6B) and closed-source LLM (e.g., GLM-4), on public medical datasets (e.g., CMeEE, CMeIE, CHIP-CDEE) covering information extraction tasks (e.g., named Entity recognition, relation extraction, event extraction) to verify the method’s generalizability. The experimental results demonstrate the potential of using PSO-based LLMs to design prompts automatically, allowing for the swift extraction of important information about patients in the EHRs.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.