EHRchitect: An open-source software tool for medical event sequences data extraction from Electronic Health Records.

IF 2.1 Q3 MEDICINE, RESEARCH & EXPERIMENTAL
Journal of Clinical and Translational Science Pub Date : 2025-03-26 eCollection Date: 2025-01-01 DOI:10.1017/cts.2025.55
Kostiantyn Botnar, Justin T Nguen, Madison G Farnsworth, George Golovko, Kamil Khanipov
{"title":"EHRchitect: An open-source software tool for medical event sequences data extraction from Electronic Health Records.","authors":"Kostiantyn Botnar, Justin T Nguen, Madison G Farnsworth, George Golovko, Kamil Khanipov","doi":"10.1017/cts.2025.55","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Electronic Health Records (EHR) analysis is pivotal in advancing medical research. Numerous real-world EHR data providers offer data access through exported datasets. While enabling profound research possibilities, exported EHR data requires quality control and restructuring for meaningful analysis. Challenges arise in medical events (e.g., diagnoses or procedures) sequence analysis, which provides critical insights into conditions, treatments, and outcomes progression. Identifying causal relationships, patterns, and trends requires a more complex approach to data mining and preparation.</p><p><strong>Methods: </strong>This paper introduces EHRchitect - an application written in Python that addresses the quality control challenges by automating dataset transformation, facilitating the creation of a clean, formatted, and optimized MySQL database (DB), and sequential data extraction according to the user's configuration.</p><p><strong>Results: </strong>The tool creates a clean, formatted, and optimized DB, enabling medical event sequence data extraction according to users' study configuration. Event sequences encompass patients' medical events in specified orders and time intervals. The extracted data are presented as distributed Parquet files, incorporating events, event transitions, patient metadata, and events metadata. The concurrent approach allows effortless scaling for multi-processor systems.</p><p><strong>Conclusion: </strong>EHRchitect streamlines the processing of large EHR datasets for research purposes. It facilitates extracting sequential event-based data, offering a highly flexible framework for configuring event and timeline parameters. The tool delivers temporal characteristics, patient demographics, and event metadata to support comprehensive analysis. The developed tool significantly reduces the time required for dataset acquisition and preparation by automating data quality control and simplifying event extraction.</p>","PeriodicalId":15529,"journal":{"name":"Journal of Clinical and Translational Science","volume":"9 1","pages":"e79"},"PeriodicalIF":2.1000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12086738/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical and Translational Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/cts.2025.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Electronic Health Records (EHR) analysis is pivotal in advancing medical research. Numerous real-world EHR data providers offer data access through exported datasets. While enabling profound research possibilities, exported EHR data requires quality control and restructuring for meaningful analysis. Challenges arise in medical events (e.g., diagnoses or procedures) sequence analysis, which provides critical insights into conditions, treatments, and outcomes progression. Identifying causal relationships, patterns, and trends requires a more complex approach to data mining and preparation.

Methods: This paper introduces EHRchitect - an application written in Python that addresses the quality control challenges by automating dataset transformation, facilitating the creation of a clean, formatted, and optimized MySQL database (DB), and sequential data extraction according to the user's configuration.

Results: The tool creates a clean, formatted, and optimized DB, enabling medical event sequence data extraction according to users' study configuration. Event sequences encompass patients' medical events in specified orders and time intervals. The extracted data are presented as distributed Parquet files, incorporating events, event transitions, patient metadata, and events metadata. The concurrent approach allows effortless scaling for multi-processor systems.

Conclusion: EHRchitect streamlines the processing of large EHR datasets for research purposes. It facilitates extracting sequential event-based data, offering a highly flexible framework for configuring event and timeline parameters. The tool delivers temporal characteristics, patient demographics, and event metadata to support comprehensive analysis. The developed tool significantly reduces the time required for dataset acquisition and preparation by automating data quality control and simplifying event extraction.

EHRchitect:用于从电子健康记录中提取医疗事件序列数据的开源软件工具。
背景:电子健康记录(EHR)分析是推进医学研究的关键。许多现实世界的EHR数据提供商通过导出的数据集提供数据访问。输出的电子病历数据在提供深入研究的可能性的同时,需要质量控制和重组,以便进行有意义的分析。挑战出现在医疗事件(例如,诊断或程序)序列分析,它提供了对条件,治疗和结果进展的关键见解。确定因果关系、模式和趋势需要更复杂的数据挖掘和准备方法。方法:本文介绍了EHRchitect——一个用Python编写的应用程序,通过自动化数据集转换,促进创建干净,格式化和优化的MySQL数据库(DB)以及根据用户配置顺序提取数据来解决质量控制挑战。结果:该工具创建了一个干净、格式化和优化的数据库,支持根据用户的研究配置提取医疗事件序列数据。事件序列以指定的顺序和时间间隔包含患者的医疗事件。提取的数据以分布式Parquet文件的形式呈现,其中包含事件、事件转换、患者元数据和事件元数据。并发方法允许轻松地扩展多处理器系统。结论:EHRchitect简化了用于研究目的的大型电子病历数据集的处理。它有助于提取基于事件的连续数据,为配置事件和时间轴参数提供高度灵活的框架。该工具提供时间特征、患者人口统计数据和事件元数据,以支持全面分析。开发的工具通过自动化数据质量控制和简化事件提取,大大减少了数据集采集和准备所需的时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Clinical and Translational Science
Journal of Clinical and Translational Science MEDICINE, RESEARCH & EXPERIMENTAL-
CiteScore
2.80
自引率
26.90%
发文量
437
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信