{"title":"Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering","authors":"Yunfei Zhang, Wei Liao","doi":"10.1016/j.jbi.2025.104893","DOIUrl":null,"url":null,"abstract":"<div><div>The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104893"},"PeriodicalIF":4.5000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425001224","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.
期刊介绍:
The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.