Enhancing pharmacogenomic data accessibility and drug safety with large language models: a case study with Llama3.1.

IF 2.7 4区医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL

Experimental Biology and Medicine Pub Date : 2024-12-03 eCollection Date: 2024-01-01 DOI:10.3389/ebm.2024.10393

Dan Li, Leihong Wu, Ying-Chi Lin, Ho-Yin Huang, Ebony Cotton, Qi Liu, Ru Chen, Ruihao Huang, Yifan Zhang, Joshua Xu

{"title":"Enhancing pharmacogenomic data accessibility and drug safety with large language models: a case study with Llama3.1.","authors":"Dan Li, Leihong Wu, Ying-Chi Lin, Ho-Yin Huang, Ebony Cotton, Qi Liu, Ru Chen, Ruihao Huang, Yifan Zhang, Joshua Xu","doi":"10.3389/ebm.2024.10393","DOIUrl":null,"url":null,"abstract":"<p><p>Pharmacogenomics (PGx) holds the promise of personalizing medical treatments based on individual genetic profiles, thereby enhancing drug efficacy and safety. However, the current landscape of PGx research is hindered by fragmented data sources, time-consuming manual data extraction processes, and the need for comprehensive and up-to-date information. This study aims to address these challenges by evaluating the ability of Large Language Models (LLMs), specifically Llama3.1-70B, to automate and improve the accuracy of PGx information extraction from the FDA Table of Pharmacogenomic Biomarkers in Drug Labeling (FDA PGx Biomarker table), which is well-structured with drug names, biomarkers, therapeutic area, and related labeling texts. Our primary goal was to test the feasibility of LLMs in streamlining PGx data extraction, as an alternative to traditional, labor-intensive approaches. Llama3.1-70B achieved 91.4% accuracy in identifying drug-biomarker pairs from single labeling texts and 82% from mixed texts, with over 85% consistency in aligning extracted PGx categories from FDA PGx Biomarker table and relevant scientific abstracts, demonstrating its effectiveness for PGx data extraction. By integrating data from diverse sources, including scientific abstracts, this approach can support pharmacologists, regulatory bodies, and healthcare researchers in updating PGx resources more efficiently, making critical information more accessible for applications in personalized medicine. In addition, this approach shows potential of discovering novel PGx information, particularly of underrepresented minority ethnic groups. This study highlights the ability of LLMs to enhance the efficiency and completeness of PGx research, thus laying a foundation for advancements in personalized medicine by ensuring that drug therapies are tailored to the genetic profiles of diverse populations.</p>","PeriodicalId":12163,"journal":{"name":"Experimental Biology and Medicine","volume":"249 ","pages":"10393"},"PeriodicalIF":2.7000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11650518/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Experimental Biology and Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/ebm.2024.10393","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Pharmacogenomics (PGx) holds the promise of personalizing medical treatments based on individual genetic profiles, thereby enhancing drug efficacy and safety. However, the current landscape of PGx research is hindered by fragmented data sources, time-consuming manual data extraction processes, and the need for comprehensive and up-to-date information. This study aims to address these challenges by evaluating the ability of Large Language Models (LLMs), specifically Llama3.1-70B, to automate and improve the accuracy of PGx information extraction from the FDA Table of Pharmacogenomic Biomarkers in Drug Labeling (FDA PGx Biomarker table), which is well-structured with drug names, biomarkers, therapeutic area, and related labeling texts. Our primary goal was to test the feasibility of LLMs in streamlining PGx data extraction, as an alternative to traditional, labor-intensive approaches. Llama3.1-70B achieved 91.4% accuracy in identifying drug-biomarker pairs from single labeling texts and 82% from mixed texts, with over 85% consistency in aligning extracted PGx categories from FDA PGx Biomarker table and relevant scientific abstracts, demonstrating its effectiveness for PGx data extraction. By integrating data from diverse sources, including scientific abstracts, this approach can support pharmacologists, regulatory bodies, and healthcare researchers in updating PGx resources more efficiently, making critical information more accessible for applications in personalized medicine. In addition, this approach shows potential of discovering novel PGx information, particularly of underrepresented minority ethnic groups. This study highlights the ability of LLMs to enhance the efficiency and completeness of PGx research, thus laying a foundation for advancements in personalized medicine by ensuring that drug therapies are tailored to the genetic profiles of diverse populations.

Abstract Image

查看原文本刊更多论文

利用大型语言模型增强药物基因组学数据可及性和药物安全性：以Llama3.1为例

药物基因组学（PGx）有望基于个体基因图谱实现个性化医疗，从而提高药物的有效性和安全性。然而，目前PGx研究的现状受到分散的数据源、耗时的人工数据提取过程以及对全面和最新信息的需求的阻碍。本研究旨在通过评估大型语言模型（LLMs），特别是Llama3.1-70B，自动化和提高从FDA药物标记药物基因组学生物标志物表（FDA PGx生物标志物表）中提取PGx信息的准确性来解决这些挑战，该表结构良好，包含药物名称，生物标志物，治疗领域和相关标记文本。我们的主要目标是测试llm在简化PGx数据提取方面的可行性，作为传统的劳动密集型方法的替代方案。Llama3.1-70B从单一标记文本中识别药物生物标志物对的准确率为91.4%，从混合文本中识别药物生物标志物对的准确率为82%，从FDA PGx生物标志物表和相关科学摘要中提取PGx类别的一致性超过85%，证明了其PGx数据提取的有效性。通过集成来自不同来源的数据，包括科学摘要，该方法可以支持药理学家、监管机构和医疗保健研究人员更有效地更新PGx资源，使个性化医疗应用程序更容易访问关键信息。此外，这种方法显示了发现新的PGx信息的潜力，特别是在代表性不足的少数民族群体中。这项研究突出了llm提高PGx研究效率和完整性的能力，从而为个性化医疗的进步奠定了基础，确保药物治疗适合不同人群的遗传图谱。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Experimental Biology and Medicine 医学-医学：研究与实验

CiteScore

6.00

自引率

0.00%

发文量

157

审稿时长

1 months

期刊介绍： Experimental Biology and Medicine (EBM) is a global, peer-reviewed journal dedicated to the publication of multidisciplinary and interdisciplinary research in the biomedical sciences. EBM provides both research and review articles as well as meeting symposia and brief communications. Articles in EBM represent cutting edge research at the overlapping junctions of the biological, physical and engineering sciences that impact upon the health and welfare of the world''s population. Topics covered in EBM include: Anatomy/Pathology; Biochemistry and Molecular Biology; Bioimaging; Biomedical Engineering; Bionanoscience; Cell and Developmental Biology; Endocrinology and Nutrition; Environmental Health/Biomarkers/Precision Medicine; Genomics, Proteomics, and Bioinformatics; Immunology/Microbiology/Virology; Mechanisms of Aging; Neuroscience; Pharmacology and Toxicology; Physiology; Stem Cell Biology; Structural Biology; Systems Biology and Microphysiological Systems; and Translational Research.