Ethan Steinberg, Michael Wornow, Suhana Bedi, Jason Alan Fries, Matthew B. A. McDermott, Nigam H. Shah
{"title":"meds_reader: A fast and efficient EHR processing library","authors":"Ethan Steinberg, Michael Wornow, Suhana Bedi, Jason Alan Fries, Matthew B. A. McDermott, Nigam H. Shah","doi":"arxiv-2409.09095","DOIUrl":null,"url":null,"abstract":"The growing demand for machine learning in healthcare requires processing\nincreasingly large electronic health record (EHR) datasets, but existing\npipelines are not computationally efficient or scalable. In this paper, we\nintroduce meds_reader, an optimized Python package for efficient EHR data\nprocessing that is designed to take advantage of many intrinsic properties of\nEHR data for improved speed. We then demonstrate the benefits of meds_reader by\nreimplementing key components of two major EHR processing pipelines, achieving\n10-100x improvements in memory, speed, and disk usage. The code for meds_reader\ncan be found at https://github.com/som-shahlab/meds_reader.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The growing demand for machine learning in healthcare requires processing
increasingly large electronic health record (EHR) datasets, but existing
pipelines are not computationally efficient or scalable. In this paper, we
introduce meds_reader, an optimized Python package for efficient EHR data
processing that is designed to take advantage of many intrinsic properties of
EHR data for improved speed. We then demonstrate the benefits of meds_reader by
reimplementing key components of two major EHR processing pipelines, achieving
10-100x improvements in memory, speed, and disk usage. The code for meds_reader
can be found at https://github.com/som-shahlab/meds_reader.