Yen Ling Koon, Hui Xing Tan, Desmond Chun Hwee Teo, Jing Wei Neo, Pei San Ang, Celine Wei Ping Loke, Mun Yee Tham, Siew Har Tan, Bee Leng Sally Soh, Pei Qin Belinda Foo, Sreemanee Raaj Dorajoo
{"title":"Unlocking Potential of Generative Large Language Models for Adverse Drug Reaction Relation Prediction in Discharge Summaries: Analysis and Strategy.","authors":"Yen Ling Koon, Hui Xing Tan, Desmond Chun Hwee Teo, Jing Wei Neo, Pei San Ang, Celine Wei Ping Loke, Mun Yee Tham, Siew Har Tan, Bee Leng Sally Soh, Pei Qin Belinda Foo, Sreemanee Raaj Dorajoo","doi":"10.1002/cpt.70100","DOIUrl":null,"url":null,"abstract":"<p><p>We present a comparative analysis of generative large language models (LLMs) for predicting causal relationships between drugs and adverse events found in text segments from discharge summaries. Despite lacking prior training for identifying related drug-adverse event pairs, generative LLMs demonstrate exceptional performance as recall-optimized models, achieving F1 scores comparable to those of fine-tuned models. Notably, on the MIMIC-Unrestricted dataset, Gemini 1.5 Pro and Llama 3.1 405B outperform our in-house fine-tuned BioM-ELECTRA-Large, with Gemini 1.5 Pro showing a 19.2% (0.724-0.863) improvement in F1 score and a 39.7% (0.675-0.943) increase in recall, while Llama 3.1 405B exhibits a 12.4% (0.724-0.814) improvement in F1 and a 40.4% (0.675-0.948) boost in recall. Additionally, we propose a hybrid approach that integrates BioM-ELECTRA-Large with generative LLMs, resulting in enhanced performance over the individual models. Our hybrid model achieves F1 score improvements ranging from 0.8% to 18.5% (0.005-0.133) over BioM-ELECTRA-Large in the validation set, primarily due to increased precision, albeit with a decrease in recall compared with the original generative LLM. Importantly, this approach yields substantial computational resource savings, as BioM-ELECTRA-Large selects only a subset of segments-ranging from 19.7% to 73.4% across our datasets-for downstream prediction by generative LLMs. By harnessing the strengths of generative LLMs as recall-optimized models and combining them with fine-tuned models, we can unlock the full potential of artificial intelligence in predicting adverse drug reaction relations, ultimately enhancing patient safety.</p>","PeriodicalId":153,"journal":{"name":"Clinical Pharmacology & Therapeutics","volume":" ","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Pharmacology & Therapeutics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/cpt.70100","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
Abstract
We present a comparative analysis of generative large language models (LLMs) for predicting causal relationships between drugs and adverse events found in text segments from discharge summaries. Despite lacking prior training for identifying related drug-adverse event pairs, generative LLMs demonstrate exceptional performance as recall-optimized models, achieving F1 scores comparable to those of fine-tuned models. Notably, on the MIMIC-Unrestricted dataset, Gemini 1.5 Pro and Llama 3.1 405B outperform our in-house fine-tuned BioM-ELECTRA-Large, with Gemini 1.5 Pro showing a 19.2% (0.724-0.863) improvement in F1 score and a 39.7% (0.675-0.943) increase in recall, while Llama 3.1 405B exhibits a 12.4% (0.724-0.814) improvement in F1 and a 40.4% (0.675-0.948) boost in recall. Additionally, we propose a hybrid approach that integrates BioM-ELECTRA-Large with generative LLMs, resulting in enhanced performance over the individual models. Our hybrid model achieves F1 score improvements ranging from 0.8% to 18.5% (0.005-0.133) over BioM-ELECTRA-Large in the validation set, primarily due to increased precision, albeit with a decrease in recall compared with the original generative LLM. Importantly, this approach yields substantial computational resource savings, as BioM-ELECTRA-Large selects only a subset of segments-ranging from 19.7% to 73.4% across our datasets-for downstream prediction by generative LLMs. By harnessing the strengths of generative LLMs as recall-optimized models and combining them with fine-tuned models, we can unlock the full potential of artificial intelligence in predicting adverse drug reaction relations, ultimately enhancing patient safety.
期刊介绍:
Clinical Pharmacology & Therapeutics (CPT) is the authoritative cross-disciplinary journal in experimental and clinical medicine devoted to publishing advances in the nature, action, efficacy, and evaluation of therapeutics. CPT welcomes original Articles in the emerging areas of translational, predictive and personalized medicine; new therapeutic modalities including gene and cell therapies; pharmacogenomics, proteomics and metabolomics; bioinformation and applied systems biology complementing areas of pharmacokinetics and pharmacodynamics, human investigation and clinical trials, pharmacovigilence, pharmacoepidemiology, pharmacometrics, and population pharmacology.