Matthew Cannon , James Stevenson , Kathryn Stahl , Rohit Basu , Adam Coffman , Susanna Kiwala , Joshua McMichael , Elaine Mardis , Obi Griffith , Malachi Griffith , Alex Wagner
{"title":"12. Contextualizing clinical significance using FDA label supplemented DGI data","authors":"Matthew Cannon , James Stevenson , Kathryn Stahl , Rohit Basu , Adam Coffman , Susanna Kiwala , Joshua McMichael , Elaine Mardis , Obi Griffith , Malachi Griffith , Alex Wagner","doi":"10.1016/j.cancergen.2024.08.014","DOIUrl":null,"url":null,"abstract":"<div><div>The drug-gene interaction database (DGIdb) is a resource that aggregates interaction data from over 40 different resources into one platform with the primary goal of making the druggable genome accessible to clinicians and researchers. By providing a public, computationally accessible database, the DGIdb enables therapeutic insights through broad aggregation of DGI data.</div><div>As part of our aggregation process, DGIdb preserves data regarding interaction types, directionality, and other attributes that enable filtering or biochemical insight. However, source data are often incomplete and may not contain the original physiological context of the interaction. Without this context, the therapeutic relevance of an interaction may be compromised or lost. In this report, we address these missing data and extract therapeutic context from free-text sources. We apply existing large language models (LLMs) that have been fine-tuned on additional medical corpuses to tag and extract indications, cancer types, and relevant pharmacogenomics from free-text, FDA approved labels. We are then able to utilize our in-house normalization services to link extracted data back to formally grouped concepts.</div><div>In a preliminary test set of 355 FDA labels, we were able to normalize 59.4%, 49.8%, and 49.1% of extracted chemical, disease, and genetic entities back to harmonized concepts. Extracting this data allows us to supplement our existing interactions with relevant context that may inform the therapeutic relevance of a particular interaction. Inclusion of these data will be particularly invaluable for variant interpretation pipelines where mutational status can lead to the identification of a lifesaving therapeutic and a positive patient outcome.</div></div>","PeriodicalId":49225,"journal":{"name":"Cancer Genetics","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210776224000528","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
The drug-gene interaction database (DGIdb) is a resource that aggregates interaction data from over 40 different resources into one platform with the primary goal of making the druggable genome accessible to clinicians and researchers. By providing a public, computationally accessible database, the DGIdb enables therapeutic insights through broad aggregation of DGI data.
As part of our aggregation process, DGIdb preserves data regarding interaction types, directionality, and other attributes that enable filtering or biochemical insight. However, source data are often incomplete and may not contain the original physiological context of the interaction. Without this context, the therapeutic relevance of an interaction may be compromised or lost. In this report, we address these missing data and extract therapeutic context from free-text sources. We apply existing large language models (LLMs) that have been fine-tuned on additional medical corpuses to tag and extract indications, cancer types, and relevant pharmacogenomics from free-text, FDA approved labels. We are then able to utilize our in-house normalization services to link extracted data back to formally grouped concepts.
In a preliminary test set of 355 FDA labels, we were able to normalize 59.4%, 49.8%, and 49.1% of extracted chemical, disease, and genetic entities back to harmonized concepts. Extracting this data allows us to supplement our existing interactions with relevant context that may inform the therapeutic relevance of a particular interaction. Inclusion of these data will be particularly invaluable for variant interpretation pipelines where mutational status can lead to the identification of a lifesaving therapeutic and a positive patient outcome.
期刊介绍:
The aim of Cancer Genetics is to publish high quality scientific papers on the cellular, genetic and molecular aspects of cancer, including cancer predisposition and clinical diagnostic applications. Specific areas of interest include descriptions of new chromosomal, molecular or epigenetic alterations in benign and malignant diseases; novel laboratory approaches for identification and characterization of chromosomal rearrangements or genomic alterations in cancer cells; correlation of genetic changes with pathology and clinical presentation; and the molecular genetics of cancer predisposition. To reach a basic science and clinical multidisciplinary audience, we welcome original full-length articles, reviews, meeting summaries, brief reports, and letters to the editor.