12. Contextualizing clinical significance using FDA label supplemented DGI data

IF 1.4 4区医学 Q4 GENETICS & HEREDITY

Cancer Genetics Pub Date : 2024-08-01 DOI:10.1016/j.cancergen.2024.08.014

Matthew Cannon , James Stevenson , Kathryn Stahl , Rohit Basu , Adam Coffman , Susanna Kiwala , Joshua McMichael , Elaine Mardis , Obi Griffith , Malachi Griffith , Alex Wagner

{"title":"12. Contextualizing clinical significance using FDA label supplemented DGI data","authors":"Matthew Cannon , James Stevenson , Kathryn Stahl , Rohit Basu , Adam Coffman , Susanna Kiwala , Joshua McMichael , Elaine Mardis , Obi Griffith , Malachi Griffith , Alex Wagner","doi":"10.1016/j.cancergen.2024.08.014","DOIUrl":null,"url":null,"abstract":"<div><div>The drug-gene interaction database (DGIdb) is a resource that aggregates interaction data from over 40 different resources into one platform with the primary goal of making the druggable genome accessible to clinicians and researchers. By providing a public, computationally accessible database, the DGIdb enables therapeutic insights through broad aggregation of DGI data.</div><div>As part of our aggregation process, DGIdb preserves data regarding interaction types, directionality, and other attributes that enable filtering or biochemical insight. However, source data are often incomplete and may not contain the original physiological context of the interaction. Without this context, the therapeutic relevance of an interaction may be compromised or lost. In this report, we address these missing data and extract therapeutic context from free-text sources. We apply existing large language models (LLMs) that have been fine-tuned on additional medical corpuses to tag and extract indications, cancer types, and relevant pharmacogenomics from free-text, FDA approved labels. We are then able to utilize our in-house normalization services to link extracted data back to formally grouped concepts.</div><div>In a preliminary test set of 355 FDA labels, we were able to normalize 59.4%, 49.8%, and 49.1% of extracted chemical, disease, and genetic entities back to harmonized concepts. Extracting this data allows us to supplement our existing interactions with relevant context that may inform the therapeutic relevance of a particular interaction. Inclusion of these data will be particularly invaluable for variant interpretation pipelines where mutational status can lead to the identification of a lifesaving therapeutic and a positive patient outcome.</div></div>","PeriodicalId":49225,"journal":{"name":"Cancer Genetics","volume":"286 ","pages":"Pages S4-S5"},"PeriodicalIF":1.4000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210776224000528","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

The drug-gene interaction database (DGIdb) is a resource that aggregates interaction data from over 40 different resources into one platform with the primary goal of making the druggable genome accessible to clinicians and researchers. By providing a public, computationally accessible database, the DGIdb enables therapeutic insights through broad aggregation of DGI data.

As part of our aggregation process, DGIdb preserves data regarding interaction types, directionality, and other attributes that enable filtering or biochemical insight. However, source data are often incomplete and may not contain the original physiological context of the interaction. Without this context, the therapeutic relevance of an interaction may be compromised or lost. In this report, we address these missing data and extract therapeutic context from free-text sources. We apply existing large language models (LLMs) that have been fine-tuned on additional medical corpuses to tag and extract indications, cancer types, and relevant pharmacogenomics from free-text, FDA approved labels. We are then able to utilize our in-house normalization services to link extracted data back to formally grouped concepts.

In a preliminary test set of 355 FDA labels, we were able to normalize 59.4%, 49.8%, and 49.1% of extracted chemical, disease, and genetic entities back to harmonized concepts. Extracting this data allows us to supplement our existing interactions with relevant context that may inform the therapeutic relevance of a particular interaction. Inclusion of these data will be particularly invaluable for variant interpretation pipelines where mutational status can lead to the identification of a lifesaving therapeutic and a positive patient outcome.

查看原文本刊更多论文

12.利用 FDA 标签补充 DGI 数据确定临床意义的内涵

药物基因相互作用数据库（DGIdb）是一种资源，它将40多种不同资源中的相互作用数据聚合到一个平台上，其主要目的是让临床医生和研究人员能够访问药物基因组。作为我们聚合过程的一部分，DGIdb 保留了有关相互作用类型、方向性和其他属性的数据，这些数据有助于筛选或生化研究。然而，源数据往往是不完整的，可能不包含相互作用的原始生理背景。没有这种背景，相互作用的治疗相关性可能会受到影响或丧失。在本报告中，我们解决了这些数据缺失的问题，并从自由文本源中提取了治疗背景。我们应用现有的大型语言模型 (LLM)，这些模型已在其他医疗语料库中进行过微调，可从自由文本、FDA 批准的标签中标记并提取适应症、癌症类型和相关药物基因组学。在 355 个 FDA 标签的初步测试集中，我们能够将 59.4%、49.8% 和 49.1% 的提取化学、疾病和基因实体归一化为统一的概念。通过提取这些数据，我们可以用相关的上下文来补充现有的相互作用，从而为特定相互作用的治疗相关性提供信息。纳入这些数据对于变异解释管道尤其有价值，因为变异状态可以帮助确定拯救生命的疗法和积极的患者预后。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cancer Genetics ONCOLOGY-GENETICS & HEREDITY

CiteScore

3.20

自引率

5.30%

发文量

167

审稿时长

27 days

期刊介绍： The aim of Cancer Genetics is to publish high quality scientific papers on the cellular, genetic and molecular aspects of cancer, including cancer predisposition and clinical diagnostic applications. Specific areas of interest include descriptions of new chromosomal, molecular or epigenetic alterations in benign and malignant diseases; novel laboratory approaches for identification and characterization of chromosomal rearrangements or genomic alterations in cancer cells; correlation of genetic changes with pathology and clinical presentation; and the molecular genetics of cancer predisposition. To reach a basic science and clinical multidisciplinary audience, we welcome original full-length articles, reviews, meeting summaries, brief reports, and letters to the editor.