C. Sideris, Sakib Shaikh, H. Kalantarian, M. Sarrafzadeh
{"title":"A Big-Data platform for Medical Knowledge Extraction from Electronic Health Records: Automatic Assignment of ICD-9 Codes","authors":"C. Sideris, Sakib Shaikh, H. Kalantarian, M. Sarrafzadeh","doi":"10.1145/2910674.2910685","DOIUrl":null,"url":null,"abstract":"In this paper, we present a big data plarform for knowledge categorization in Electronic Health Records and examine its application to automatic assignment of ICD-9 codes. Our platform relies on reusable, adaptable components that can perform knowledge extraction at a large scale. For the ICD-9 automatic assignment, we build and validate our approach using data from the MIMIC II Clinical Database that contains over 20,000 discharge summaries. We show that our platform can achieve state of the art performance in this dataset and that the classification results improve with more data. Overall, in the first level of the ICD-9 hierarchy our algorithm achieves an average precision of 79.7% for an average recall of 70.2%.","PeriodicalId":359504,"journal":{"name":"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments","volume":"424 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2910674.2910685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we present a big data plarform for knowledge categorization in Electronic Health Records and examine its application to automatic assignment of ICD-9 codes. Our platform relies on reusable, adaptable components that can perform knowledge extraction at a large scale. For the ICD-9 automatic assignment, we build and validate our approach using data from the MIMIC II Clinical Database that contains over 20,000 discharge summaries. We show that our platform can achieve state of the art performance in this dataset and that the classification results improve with more data. Overall, in the first level of the ICD-9 hierarchy our algorithm achieves an average precision of 79.7% for an average recall of 70.2%.