Massimiliano Russo, Sushama Kattinakere Sreedhara, Joshua Smith, Sharon E Davis, Judith C Maro, Thomas Deramus, Joyce Lii, Jie Yang, Rishi Desai, José J Hernández-Muñoz, Yong Ma, Youjin Wang, Jamal T Jones, Shirley V Wang
{"title":"Electronic Health Record (EHR) Enhanced Signal Detection Using Tree-Based Scan Statistic Methods.","authors":"Massimiliano Russo, Sushama Kattinakere Sreedhara, Joshua Smith, Sharon E Davis, Judith C Maro, Thomas Deramus, Joyce Lii, Jie Yang, Rishi Desai, José J Hernández-Muñoz, Yong Ma, Youjin Wang, Jamal T Jones, Shirley V Wang","doi":"10.1093/aje/kwaf199","DOIUrl":null,"url":null,"abstract":"<p><p>Tree-based scan statistics (TBSS) are data mining methods that screen thousands of hierarchically related health outcomes to detect unsuspected adverse drug effects. TBSS traditionally analyze claims data with outcomes defined via diagnosis codes. TBSS have not been previously applied to rich clinical information in Electronic Health Records (EHR). We developed approaches for integrating EHR data in TBSS analyses, including outcomes derived from natural language processing (NLP) applied to clinical notes and laboratory results, related via multipath hierarchical structures. We consider four settings that sequentially add sources of outcomes to the TBSS tree: 1) diagnosis code, 2) NLP-derived outcomes, 3) binary outcomes from lab results, and 4) continuous lab results. In a comparative cohort study involving second-generation sulfonylureas (SUs) and dipeptidyl peptidase 4 (DPP-4) inhibitors among adults with type-2 diabetes, with an a priori expected signal of hypoglycemia, diagnosis code data showed no statistical alerts for inpatient or emergency department settings. Adding NLP-derived outcomes resulted in an alert for \"Headaches\" (p=0.047), a nonspecific symptom of hypoglycemia. Progressively adding binary and continuous lab results produced the same alert. Integrating EHR in TBSS can be useful for the detection of safety signals for further investigation.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwaf199","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Tree-based scan statistics (TBSS) are data mining methods that screen thousands of hierarchically related health outcomes to detect unsuspected adverse drug effects. TBSS traditionally analyze claims data with outcomes defined via diagnosis codes. TBSS have not been previously applied to rich clinical information in Electronic Health Records (EHR). We developed approaches for integrating EHR data in TBSS analyses, including outcomes derived from natural language processing (NLP) applied to clinical notes and laboratory results, related via multipath hierarchical structures. We consider four settings that sequentially add sources of outcomes to the TBSS tree: 1) diagnosis code, 2) NLP-derived outcomes, 3) binary outcomes from lab results, and 4) continuous lab results. In a comparative cohort study involving second-generation sulfonylureas (SUs) and dipeptidyl peptidase 4 (DPP-4) inhibitors among adults with type-2 diabetes, with an a priori expected signal of hypoglycemia, diagnosis code data showed no statistical alerts for inpatient or emergency department settings. Adding NLP-derived outcomes resulted in an alert for "Headaches" (p=0.047), a nonspecific symptom of hypoglycemia. Progressively adding binary and continuous lab results produced the same alert. Integrating EHR in TBSS can be useful for the detection of safety signals for further investigation.
期刊介绍:
The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research.
It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.