Dimensionality Reduction with Random Indexing: An Application on Adverse Drug Event Detection Using Electronic Health Records

2014 IEEE 27th International Symposium on Computer-Based Medical Systems Pub Date : 2014-05-27 DOI:10.1109/CBMS.2014.22

Isak Karlsson, J. Zhao

{"title":"Dimensionality Reduction with Random Indexing: An Application on Adverse Drug Event Detection Using Electronic Health Records","authors":"Isak Karlsson, J. Zhao","doi":"10.1109/CBMS.2014.22","DOIUrl":null,"url":null,"abstract":"Although electronic health records (EHRs) have recently become an important data source for drug safety signals detection, which is usually evaluated in clinical trials, the use of such data is often prohibited by dimensionality and available computer resources. Currently, several methods for reducing dimensionality are developed, used and evaluated within the medical domain. While these methods perform well, the computational cost tends to increase with growing dimensionality. An alternative solution is random indexing, a technique commonly employed in text classification to reduce the dimensionality of large and sparse documents. This study aims to explore how the predictive performance of random forest is affected by dimensionality reduction through random indexing to predict adverse drug reactions (ADEs). Data are extracted from EHRs and the task is to predict whether or not a patient should be assigned an ADE related diagnosis code. Four different dimensionality settings are investigated and their sensitivity, specificity and area under ROC curve are reported for 14 data sets. The results show that for the investigated data sets, the predictive performance is not negatively affected by dimensionality reduction, however, the computational cost is significantly reduced. Therefore, this study concludes that applying random indexing on EHR data reduces the computational cost, while retaining the predictive performance.","PeriodicalId":398710,"journal":{"name":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2014.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Although electronic health records (EHRs) have recently become an important data source for drug safety signals detection, which is usually evaluated in clinical trials, the use of such data is often prohibited by dimensionality and available computer resources. Currently, several methods for reducing dimensionality are developed, used and evaluated within the medical domain. While these methods perform well, the computational cost tends to increase with growing dimensionality. An alternative solution is random indexing, a technique commonly employed in text classification to reduce the dimensionality of large and sparse documents. This study aims to explore how the predictive performance of random forest is affected by dimensionality reduction through random indexing to predict adverse drug reactions (ADEs). Data are extracted from EHRs and the task is to predict whether or not a patient should be assigned an ADE related diagnosis code. Four different dimensionality settings are investigated and their sensitivity, specificity and area under ROC curve are reported for 14 data sets. The results show that for the investigated data sets, the predictive performance is not negatively affected by dimensionality reduction, however, the computational cost is significantly reduced. Therefore, this study concludes that applying random indexing on EHR data reduces the computational cost, while retaining the predictive performance.

查看原文本刊更多论文

随机标引降维方法在电子病历药物不良事件检测中的应用

虽然电子健康记录(EHRs)近年来已成为药物安全信号检测的重要数据源，通常在临床试验中进行评估，但此类数据的使用往往受到维度和可用计算机资源的限制。目前，在医学领域中有几种降维方法被开发、使用和评估。虽然这些方法性能良好，但计算成本随着维数的增加而增加。另一种解决方案是随机索引，这是文本分类中常用的一种技术，用于降低大型和稀疏文档的维数。本研究旨在通过随机索引预测药物不良反应(ADEs)，探讨随机森林的降维对预测性能的影响。数据从电子病历中提取，任务是预测是否应该为患者分配与ADE相关的诊断代码。研究了四种不同的维度设置，并报告了14个数据集的灵敏度、特异性和ROC曲线下的面积。结果表明，对于所研究的数据集，降维对预测性能没有负面影响，但计算成本显著降低。因此，本研究认为对电子病历数据进行随机索引在保持预测性能的同时，降低了计算成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 27th International Symposium on Computer-Based Medical Systems

自引率

0.00%

发文量