{"title":"Dimensionality Reduction with Random Indexing: An Application on Adverse Drug Event Detection Using Electronic Health Records","authors":"Isak Karlsson, J. Zhao","doi":"10.1109/CBMS.2014.22","DOIUrl":null,"url":null,"abstract":"Although electronic health records (EHRs) have recently become an important data source for drug safety signals detection, which is usually evaluated in clinical trials, the use of such data is often prohibited by dimensionality and available computer resources. Currently, several methods for reducing dimensionality are developed, used and evaluated within the medical domain. While these methods perform well, the computational cost tends to increase with growing dimensionality. An alternative solution is random indexing, a technique commonly employed in text classification to reduce the dimensionality of large and sparse documents. This study aims to explore how the predictive performance of random forest is affected by dimensionality reduction through random indexing to predict adverse drug reactions (ADEs). Data are extracted from EHRs and the task is to predict whether or not a patient should be assigned an ADE related diagnosis code. Four different dimensionality settings are investigated and their sensitivity, specificity and area under ROC curve are reported for 14 data sets. The results show that for the investigated data sets, the predictive performance is not negatively affected by dimensionality reduction, however, the computational cost is significantly reduced. Therefore, this study concludes that applying random indexing on EHR data reduces the computational cost, while retaining the predictive performance.","PeriodicalId":398710,"journal":{"name":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS.2014.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Although electronic health records (EHRs) have recently become an important data source for drug safety signals detection, which is usually evaluated in clinical trials, the use of such data is often prohibited by dimensionality and available computer resources. Currently, several methods for reducing dimensionality are developed, used and evaluated within the medical domain. While these methods perform well, the computational cost tends to increase with growing dimensionality. An alternative solution is random indexing, a technique commonly employed in text classification to reduce the dimensionality of large and sparse documents. This study aims to explore how the predictive performance of random forest is affected by dimensionality reduction through random indexing to predict adverse drug reactions (ADEs). Data are extracted from EHRs and the task is to predict whether or not a patient should be assigned an ADE related diagnosis code. Four different dimensionality settings are investigated and their sensitivity, specificity and area under ROC curve are reported for 14 data sets. The results show that for the investigated data sets, the predictive performance is not negatively affected by dimensionality reduction, however, the computational cost is significantly reduced. Therefore, this study concludes that applying random indexing on EHR data reduces the computational cost, while retaining the predictive performance.