EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records

IF 4.9 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Jiawei Luo , Shixin Huang , Lan Lan , Shu Yang , Tingqian Cao , Jin Yin , Jiajun Qiu , Xiaoyan Yang , Yingqiang Guo , Xiaobo Zhou
{"title":"EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records","authors":"Jiawei Luo ,&nbsp;Shixin Huang ,&nbsp;Lan Lan ,&nbsp;Shu Yang ,&nbsp;Tingqian Cao ,&nbsp;Jin Yin ,&nbsp;Jiajun Qiu ,&nbsp;Xiaoyan Yang ,&nbsp;Yingqiang Guo ,&nbsp;Xiaobo Zhou","doi":"10.1016/j.cmpb.2024.108521","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the <strong><u>E</u></strong>lectronic <strong><u>M</u></strong>edical <strong><u>R</u></strong>ecord <strong><u>L</u></strong>ongitudinal <strong><u>I</u></strong>rregular Data <strong><u>P</u></strong>reprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability.</div></div><div><h3>Materials and Methods</h3><div>EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks.</div></div><div><h3>Results</h3><div>In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups.</div></div><div><h3>Conclusion</h3><div>EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"259 ","pages":"Article 108521"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724005145","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability.

Materials and Methods

EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks.

Results

In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups.

Conclusion

EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.

Abstract Image

求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信