Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2025-02-07 DOI:10.1016/j.jbi.2025.104789

Yiming Li , Deepthi Viswaroopan , William He , Jianfu Li , Xu Zuo , Hua Xu , Cui Tao

{"title":"Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media","authors":"Yiming Li , Deepthi Viswaroopan , William He , Jianfu Li , Xu Zuo , Hua Xu , Cui Tao","doi":"10.1016/j.jbi.2025.104789","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations, identifying potential risks and ensuring the safe use of these products. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition (NER) tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance.</div></div><div><h3>Methods</h3><div>In this study, we utilized reports and posts from the Vaccine Adverse Event Reporting System (VAERS) (n = 230), Twitter (n = 3,383), and Reddit (n = 49) as our corpora. Our goal was to extract three types of entities: vaccine, shot, and adverse event (ae). We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, Llama-2 7b, and Llama-2 13b, as well as traditional deep learning models like Recurrent neural network (RNN) and Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT). To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance.</div></div><div><h3>Results</h3><div>The ensemble demonstrated the best performance in identifying the entities “vaccine,” “shot,” and “ae,” achieving strict F1-scores of 0.878, 0.930, and 0.925, respectively, and a micro-average score of 0.903. These results underscore the significance of fine-tuning models for specific tasks and demonstrate the effectiveness of ensemble methods in enhancing performance.</div></div><div><h3>Conclusion</h3><div>In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information following COVID-19 vaccination. This study contributes to the advancement of natural language processing in the biomedical domain, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104789"},"PeriodicalIF":4.0000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046425000188","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations, identifying potential risks and ensuring the safe use of these products. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition (NER) tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance.

Methods

In this study, we utilized reports and posts from the Vaccine Adverse Event Reporting System (VAERS) (n = 230), Twitter (n = 3,383), and Reddit (n = 49) as our corpora. Our goal was to extract three types of entities: vaccine, shot, and adverse event (ae). We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, Llama-2 7b, and Llama-2 13b, as well as traditional deep learning models like Recurrent neural network (RNN) and Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT). To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance.

Results

The ensemble demonstrated the best performance in identifying the entities “vaccine,” “shot,” and “ae,” achieving strict F1-scores of 0.878, 0.930, and 0.925, respectively, and a micro-average score of 0.903. These results underscore the significance of fine-tuning models for specific tasks and demonstrate the effectiveness of ensemble methods in enhancing performance.

Conclusion

In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information following COVID-19 vaccination. This study contributes to the advancement of natural language processing in the biomedical domain, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance.

Abstract Image

查看原文本刊更多论文

使用深度学习和微调大型语言模型的集成来改进实体识别：从VAERS和社交媒体中提取不良事件的案例研究。

目的：从文本数据中提取COVID-19疫苗后的不良事件（AE）对于监测和分析免疫安全性、识别潜在风险和确保这些产品的安全使用至关重要。传统的深度学习模型擅长于学习序列数据中复杂的特征表示和依赖关系，但通常需要大量的标记数据。相比之下，大型语言模型（llm）在理解上下文信息方面表现出色，但在命名实体识别（NER）任务上表现不稳定，这可能是由于它们的训练范围广泛但不具体。本研究旨在评估llm和传统深度学习模型在声发射提取中的有效性，并评估集成这些模型对性能的影响。方法：在本研究中，我们使用来自疫苗不良事件报告系统（VAERS）（n = 230）、Twitter （n = 3,383）和Reddit （n = 49）的报告和帖子作为我们的语料库。我们的目标是提取三种类型的实体：疫苗、注射和不良事件（ae）。我们探索并微调了多个llm（除了GPT-4），包括GPT-2、GPT-3.5、GPT-4、llama - 27b和llama - 213b，以及传统的深度学习模型，如循环神经网络（RNN）和生物医学文本挖掘（BioBERT）的双向编码器表示。为了提高性能，我们创建了具有最佳性能的三个模型的集合。在评价中，我们使用严格和宽松的F1分数来评价每个实体类型的性能，并使用微平均F1来评价整体性能。结果：该集合对“vaccine”、“shot”和“ae”三个词的识别效果最好，其严格f1分值分别为0.878、0.930和0.925，微平均分值为0.903。这些结果强调了针对特定任务的微调模型的重要性，并证明了集成方法在提高性能方面的有效性。结论：综上所述，本研究证明了将微调传统深度学习模型和llm集成在一起提取COVID-19疫苗接种后ae相关信息的有效性和鲁棒性。本研究有助于自然语言处理在生物医学领域的发展，为改善药物警戒和公共卫生监测中文本数据的AE提取提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.