An Extensive Data Processing Pipeline for MIMIC-IV.

Proceedings of machine learning research Pub Date : 2022-11-01

Mehak Gupta, Brennan Gallamoza, Nicolas Cutrona, Pranjal Dhakal, Raphael Poulain, Rahmatollah Beheshti

{"title":"An Extensive Data Processing Pipeline for MIMIC-IV.","authors":"Mehak Gupta, Brennan Gallamoza, Nicolas Cutrona, Pranjal Dhakal, Raphael Poulain, Rahmatollah Beheshti","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical purposes. This growing area of research has exposed the challenges of the accessibility of EHRs. MIMIC is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. The absence of standardized preprocessing steps can be, however, a significant barrier to the wider adoption of this rare resource. Additionally, this absence can reduce the reproducibility of the developed tools and limit the ability to compare the results among similar studies. In this work, we provide a greatly customizable pipeline to extract, clean, and preprocess the data available in the fourth version of the MIMIC dataset (MIMIC-IV). The pipeline also presents an end-to-end wizard-like package supporting predictive model creations and evaluations. The pipeline covers a range of clinical prediction tasks which can be broadly classified into four categories - readmission, length of stay, mortality, and phenotype prediction. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"193 ","pages":"311-325"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854277/pdf/nihms-1865425.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical purposes. This growing area of research has exposed the challenges of the accessibility of EHRs. MIMIC is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. The absence of standardized preprocessing steps can be, however, a significant barrier to the wider adoption of this rare resource. Additionally, this absence can reduce the reproducibility of the developed tools and limit the ability to compare the results among similar studies. In this work, we provide a greatly customizable pipeline to extract, clean, and preprocess the data available in the fourth version of the MIMIC dataset (MIMIC-IV). The pipeline also presents an end-to-end wizard-like package supporting predictive model creations and evaluations. The pipeline covers a range of clinical prediction tasks which can be broadly classified into four categories - readmission, length of stay, mortality, and phenotype prediction. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.

本刊更多论文

用于MIMIC-IV的扩展数据处理管道。

越来越多的研究致力于将机器学习方法应用于各种临床目的的电子健康记录(EHR)数据。这一不断发展的研究领域暴露了电子病历可及性的挑战。MIMIC是一个流行的、公开的、免费的电子病历数据集，其原始格式已在许多研究中使用。然而，缺乏标准化的预处理步骤可能是广泛采用这种稀有资源的一个重大障碍。此外，这种缺失会降低开发工具的可重复性，并限制在类似研究中比较结果的能力。在这项工作中，我们提供了一个非常可定制的管道来提取、清理和预处理第四版MIMIC数据集(MIMIC- iv)中的数据。该管道还提供了一个端到端的类似向导的包，支持预测模型的创建和评估。该管道涵盖了一系列临床预测任务，可大致分为四类-再入院，住院时间，死亡率和表型预测。该工具可在https://github.com/healthylaife/MIMIC-IV-Data-Pipeline上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of machine learning research

自引率

0.00%

发文量