用于从HL7 CDA文档提取运行状况数据的可重复流程。

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Harry-Anton Talvik , Marek Oja , Sirli Tamm , Kerli Mooses , Dage Särg , Marcus Lõo , Õie Renata Siimon , Hendrik Šuvalov , Raivo Kolde , Jaak Vilo , Sulev Reisberg , Sven Laur
{"title":"用于从HL7 CDA文档提取运行状况数据的可重复流程。","authors":"Harry-Anton Talvik ,&nbsp;Marek Oja ,&nbsp;Sirli Tamm ,&nbsp;Kerli Mooses ,&nbsp;Dage Särg ,&nbsp;Marcus Lõo ,&nbsp;Õie Renata Siimon ,&nbsp;Hendrik Šuvalov ,&nbsp;Raivo Kolde ,&nbsp;Jaak Vilo ,&nbsp;Sulev Reisberg ,&nbsp;Sven Laur","doi":"10.1016/j.jbi.2024.104765","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aims to address the gap in the literature on converting real-world Clinical Document Architecture (CDA) data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), focusing on the initial steps preceding the mapping phase. We highlight the importance of a repeatable Extract-Transform-Load (ETL) pipeline for health data extraction from HL7 CDA documents in Estonia for research purposes.</div></div><div><h3>Methods</h3><div>We developed a repeatable ETL pipeline to facilitate the extraction, cleaning, and restructuring of health data from CDA documents to OMOP CDM, ensuring a high-quality and structured data format. This pipeline was designed to adapt to continuously updated data exchange format changes and handle various CDA document subsets for different scientific studies.</div></div><div><h3>Results</h3><div>We demonstrated via selected use cases that our pipeline successfully transformed a significant portion of diagnosis codes, body weight and eGFR measurements, and PAP test results from CDA documents into OMOP CDM, showing the ease of extracting structured data. However, challenges such as harmonising diverse coding systems and extracting lab results from free-text sections were encountered. The iterative development of the pipeline facilitated swift error detection and correction, enhancing the process’s efficiency.</div></div><div><h3>Conclusion</h3><div>After a decade of focused work, our research has led to the development of an ETL pipeline that effectively transforms HL7 CDA documents into OMOP CDM in Estonia, addressing key data extraction and transformation challenges. The pipeline’s repeatability and adaptability to various data subsets make it a valuable resource for researchers dealing with health data. While tested on Estonian data, the principles outlined are broadly applicable, potentially aiding in handling health data standards that vary by country. Despite newer health data standards emerging, the relevance of CDA for retrospective health studies ensures the continuing importance of this work.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"161 ","pages":"Article 104765"},"PeriodicalIF":4.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Repeatable process for extracting health data from HL7 CDA documents\",\"authors\":\"Harry-Anton Talvik ,&nbsp;Marek Oja ,&nbsp;Sirli Tamm ,&nbsp;Kerli Mooses ,&nbsp;Dage Särg ,&nbsp;Marcus Lõo ,&nbsp;Õie Renata Siimon ,&nbsp;Hendrik Šuvalov ,&nbsp;Raivo Kolde ,&nbsp;Jaak Vilo ,&nbsp;Sulev Reisberg ,&nbsp;Sven Laur\",\"doi\":\"10.1016/j.jbi.2024.104765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>This study aims to address the gap in the literature on converting real-world Clinical Document Architecture (CDA) data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), focusing on the initial steps preceding the mapping phase. We highlight the importance of a repeatable Extract-Transform-Load (ETL) pipeline for health data extraction from HL7 CDA documents in Estonia for research purposes.</div></div><div><h3>Methods</h3><div>We developed a repeatable ETL pipeline to facilitate the extraction, cleaning, and restructuring of health data from CDA documents to OMOP CDM, ensuring a high-quality and structured data format. This pipeline was designed to adapt to continuously updated data exchange format changes and handle various CDA document subsets for different scientific studies.</div></div><div><h3>Results</h3><div>We demonstrated via selected use cases that our pipeline successfully transformed a significant portion of diagnosis codes, body weight and eGFR measurements, and PAP test results from CDA documents into OMOP CDM, showing the ease of extracting structured data. However, challenges such as harmonising diverse coding systems and extracting lab results from free-text sections were encountered. The iterative development of the pipeline facilitated swift error detection and correction, enhancing the process’s efficiency.</div></div><div><h3>Conclusion</h3><div>After a decade of focused work, our research has led to the development of an ETL pipeline that effectively transforms HL7 CDA documents into OMOP CDM in Estonia, addressing key data extraction and transformation challenges. The pipeline’s repeatability and adaptability to various data subsets make it a valuable resource for researchers dealing with health data. While tested on Estonian data, the principles outlined are broadly applicable, potentially aiding in handling health data standards that vary by country. Despite newer health data standards emerging, the relevance of CDA for retrospective health studies ensures the continuing importance of this work.</div></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"161 \",\"pages\":\"Article 104765\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046424001837\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046424001837","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在解决将现实世界临床文档架构(CDA)数据转换为观察性医疗结果伙伴关系(OMOP)公共数据模型(CDM)的文献空白,重点关注映射阶段之前的初始步骤。我们强调了可重复的提取-转换-加载(ETL)管道的重要性,以便从爱沙尼亚的HL7 CDA文档中提取健康数据,用于研究目的。方法:我们开发了一个可重复的ETL管道,以促进从CDA文档到OMOP CDM的健康数据的提取、清理和重组,确保高质量和结构化的数据格式。该管道旨在适应不断更新的数据交换格式变化,并处理不同科学研究的各种CDA文档子集。结果:我们通过选定的用例证明,我们的管道成功地将很大一部分诊断代码、体重和eGFR测量值以及PAP测试结果从CDA文档转换为OMOP CDM,显示了提取结构化数据的易用性。然而,遇到了诸如协调不同的编码系统和从自由文本部分提取实验室结果等挑战。流水线的迭代开发有助于快速检测和纠正错误,提高了流程的效率。结论:经过十年的重点工作,我们的研究已经开发出一种ETL管道,可以有效地将HL7 CDA文档转换为爱沙尼亚的OMOP CDM,解决了关键数据提取和转换挑战。该管道的可重复性和对各种数据子集的适应性使其成为研究人员处理健康数据的宝贵资源。虽然在爱沙尼亚的数据上进行了测试,但概述的原则是广泛适用的,可能有助于处理因国家而异的卫生数据标准。尽管出现了新的健康数据标准,但CDA与回顾性健康研究的相关性确保了这项工作的持续重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Repeatable process for extracting health data from HL7 CDA documents

Repeatable process for extracting health data from HL7 CDA documents

Objective

This study aims to address the gap in the literature on converting real-world Clinical Document Architecture (CDA) data into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), focusing on the initial steps preceding the mapping phase. We highlight the importance of a repeatable Extract-Transform-Load (ETL) pipeline for health data extraction from HL7 CDA documents in Estonia for research purposes.

Methods

We developed a repeatable ETL pipeline to facilitate the extraction, cleaning, and restructuring of health data from CDA documents to OMOP CDM, ensuring a high-quality and structured data format. This pipeline was designed to adapt to continuously updated data exchange format changes and handle various CDA document subsets for different scientific studies.

Results

We demonstrated via selected use cases that our pipeline successfully transformed a significant portion of diagnosis codes, body weight and eGFR measurements, and PAP test results from CDA documents into OMOP CDM, showing the ease of extracting structured data. However, challenges such as harmonising diverse coding systems and extracting lab results from free-text sections were encountered. The iterative development of the pipeline facilitated swift error detection and correction, enhancing the process’s efficiency.

Conclusion

After a decade of focused work, our research has led to the development of an ETL pipeline that effectively transforms HL7 CDA documents into OMOP CDM in Estonia, addressing key data extraction and transformation challenges. The pipeline’s repeatability and adaptability to various data subsets make it a valuable resource for researchers dealing with health data. While tested on Estonian data, the principles outlined are broadly applicable, potentially aiding in handling health data standards that vary by country. Despite newer health data standards emerging, the relevance of CDA for retrospective health studies ensures the continuing importance of this work.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信