Yutaro Tanaka, Hsin Yi Chen, Pietro Belloni, Undina Gisladottir, Jenna Kefeli, Jason Patterson, Apoorva Srinivasan, Michael Zietz, Gaurav Sirdeshmukh, Jacob Berkowitz, Kathleen LaRow Brown, Nicholas P Tatonetti
{"title":"OnSIDES database: Extracting adverse drug events from drug labels using natural language processing models.","authors":"Yutaro Tanaka, Hsin Yi Chen, Pietro Belloni, Undina Gisladottir, Jenna Kefeli, Jason Patterson, Apoorva Srinivasan, Michael Zietz, Gaurav Sirdeshmukh, Jacob Berkowitz, Kathleen LaRow Brown, Nicholas P Tatonetti","doi":"10.1016/j.medj.2025.100642","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting our capacity to study drug safety on a broader, systematic scale. Recent advances in natural language processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text.</p><p><strong>Methods: </strong>We fine-tune a PubMedBERT model to extract ADE terms from text in FDA Structured Product Labels for prescription drugs. Here, we present OnSIDES (on-label side effects resource), a compiled, machine-friendly database of drug-ADE pairs generated with this method. We further utilize this method to extract pediatric-specific ADEs, serious ADEs from labels' \"Boxed Warnings\" section, and ADEs from drug labels of other major nations-the UK, the European Union, and Japan-to build a complementary OnSIDES-INTL database. To present OnSIDES' potential applications, we leverage the database to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures.</p><p><strong>Findings: </strong>We achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels' \"Adverse Reactions\" section. OnSIDES contains over 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels.</p><p><strong>Conclusions: </strong>OnSIDES can be used as a comprehensive resource to study and enhance drug safety.</p><p><strong>Funding: </strong>R35GM131905 to N.P.T.; T32GM145440 to H.Y.C.; and T15LM007079 to U.G., M.Z., and K.L.B.</p>","PeriodicalId":29964,"journal":{"name":"Med","volume":" ","pages":"100642"},"PeriodicalIF":12.8000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Med","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.medj.2025.100642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Adverse drug events (ADEs) are the fourth leading cause of death in the US and cost billions of dollars annually in increased healthcare costs. However, few machine-readable databases of ADEs exist, limiting our capacity to study drug safety on a broader, systematic scale. Recent advances in natural language processing methods, such as BERT models, present an opportunity to accurately extract relevant information from unstructured biomedical text.
Methods: We fine-tune a PubMedBERT model to extract ADE terms from text in FDA Structured Product Labels for prescription drugs. Here, we present OnSIDES (on-label side effects resource), a compiled, machine-friendly database of drug-ADE pairs generated with this method. We further utilize this method to extract pediatric-specific ADEs, serious ADEs from labels' "Boxed Warnings" section, and ADEs from drug labels of other major nations-the UK, the European Union, and Japan-to build a complementary OnSIDES-INTL database. To present OnSIDES' potential applications, we leverage the database to predict novel drug targets and indications, analyze enrichment of ADEs across drug classes, and predict novel ADEs from chemical compound structures.
Findings: We achieve an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting ADEs from the labels' "Adverse Reactions" section. OnSIDES contains over 3.6 million drug-ADE pairs for 3,233 unique drug ingredient combinations extracted from 47,211 labels.
Conclusions: OnSIDES can be used as a comprehensive resource to study and enhance drug safety.
Funding: R35GM131905 to N.P.T.; T32GM145440 to H.Y.C.; and T15LM007079 to U.G., M.Z., and K.L.B.
期刊介绍:
Med is a flagship medical journal published monthly by Cell Press, the global publisher of trusted and authoritative science journals including Cell, Cancer Cell, and Cell Reports Medicine. Our mission is to advance clinical research and practice by providing a communication forum for the publication of clinical trial results, innovative observations from longitudinal cohorts, and pioneering discoveries about disease mechanisms. The journal also encourages thought-leadership discussions among biomedical researchers, physicians, and other health scientists and stakeholders. Our goal is to improve health worldwide sustainably and ethically.
Med publishes rigorously vetted original research and cutting-edge review and perspective articles on critical health issues globally and regionally. Our research section covers clinical case reports, first-in-human studies, large-scale clinical trials, population-based studies, as well as translational research work with the potential to change the course of medical research and improve clinical practice.