通过Elasticsearch,在整合生物学和床边(i2b2)模型的信息学中整合医疗保健数据:法国大学医院的设计、实施和评估。

IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS
Romain Griffier, Fleur Mougin, Vianney Jouhet
{"title":"通过Elasticsearch,在整合生物学和床边(i2b2)模型的信息学中整合医疗保健数据:法国大学医院的设计、实施和评估。","authors":"Romain Griffier, Fleur Mougin, Vianney Jouhet","doi":"10.2196/65753","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The volume of digital data in health care is continually growing. In addition to its use in health care, the health data collected can also serve secondary purposes, such as research. In this context, clinical data warehouses (CDWs) provide the infrastructure and organization necessary to enhance the secondary use of health data. Various data models have been proposed for structuring data in a CDW, including the Informatics for Integrating Biology & the Bedside (i2b2) model, which relies on a relational database. However, this persistence approach can lead to performance issues when executing queries on massive data sets.</p><p><strong>Objective: </strong>This study aims to describe the necessary transformations and their implementation to enable i2b2's search engine to perform the phenotyping task using data persistence in a NoSQL Elasticsearch database.</p><p><strong>Methods: </strong>This study compares data persistence in a standard relational database with a NoSQL Elasticsearch database in terms of query response and execution performance (focusing on counting queries based on structured data, numerical data, and free text, including temporal filtering) as well as material resource requirements. Additionally, the data loading and updating processes are described.</p><p><strong>Results: </strong>We propose adaptations to the i2b2 model to accommodate the specific features of Elasticsearch, particularly its inability to perform joins between different indexes. The implementation was tested and evaluated within the CDW of Bordeaux University Hospital, which contains data on 2.5 million patients and over 3 billion observations. Overall, Elasticsearch achieves shorter query execution times compared with a relational database, with particularly significant performance gains for free-text searches. Additionally, compared with an indexed relational database (including a full-text index), Elasticsearch requires less disk space for storage.</p><p><strong>Conclusions: </strong>We demonstrate that implementing i2b2 with Elasticsearch is feasible and significantly improves query performance while reducing disk space usage. This implementation is currently in production at Bordeaux University Hospital.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e65753"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12062766/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integrating Health Care Data in an Informatics for Integrating Biology & the Bedside (i2b2) Model Persisted Through Elasticsearch: Design, Implementation, and Evaluation in a French University Hospital.\",\"authors\":\"Romain Griffier, Fleur Mougin, Vianney Jouhet\",\"doi\":\"10.2196/65753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The volume of digital data in health care is continually growing. In addition to its use in health care, the health data collected can also serve secondary purposes, such as research. In this context, clinical data warehouses (CDWs) provide the infrastructure and organization necessary to enhance the secondary use of health data. Various data models have been proposed for structuring data in a CDW, including the Informatics for Integrating Biology & the Bedside (i2b2) model, which relies on a relational database. However, this persistence approach can lead to performance issues when executing queries on massive data sets.</p><p><strong>Objective: </strong>This study aims to describe the necessary transformations and their implementation to enable i2b2's search engine to perform the phenotyping task using data persistence in a NoSQL Elasticsearch database.</p><p><strong>Methods: </strong>This study compares data persistence in a standard relational database with a NoSQL Elasticsearch database in terms of query response and execution performance (focusing on counting queries based on structured data, numerical data, and free text, including temporal filtering) as well as material resource requirements. Additionally, the data loading and updating processes are described.</p><p><strong>Results: </strong>We propose adaptations to the i2b2 model to accommodate the specific features of Elasticsearch, particularly its inability to perform joins between different indexes. The implementation was tested and evaluated within the CDW of Bordeaux University Hospital, which contains data on 2.5 million patients and over 3 billion observations. Overall, Elasticsearch achieves shorter query execution times compared with a relational database, with particularly significant performance gains for free-text searches. Additionally, compared with an indexed relational database (including a full-text index), Elasticsearch requires less disk space for storage.</p><p><strong>Conclusions: </strong>We demonstrate that implementing i2b2 with Elasticsearch is feasible and significantly improves query performance while reducing disk space usage. This implementation is currently in production at Bordeaux University Hospital.</p>\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e65753\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12062766/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/65753\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/65753","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

背景:医疗保健领域的数字数据量在不断增长。除了用于卫生保健之外,收集的卫生数据还可用于次要目的,例如研究。在这种情况下,临床数据仓库(cdw)提供了必要的基础设施和组织,以加强卫生数据的二次使用。已经提出了各种数据模型来构建CDW中的数据,包括依赖于关系数据库的集成生物学和床边信息学(i2b2)模型。但是,在对大量数据集执行查询时,这种持久性方法可能会导致性能问题。目的:本研究旨在描述必要的转换及其实现,以使i2b2的搜索引擎能够在NoSQL Elasticsearch数据库中使用数据持久化来执行表型任务。方法:本研究比较了标准关系数据库和NoSQL Elasticsearch数据库在查询响应和执行性能(重点是基于结构化数据、数值数据和自由文本的查询计数,包括时间过滤)以及物质资源需求方面的数据持久性。此外,还描述了数据加载和更新过程。结果:我们建议对i2b2模型进行调整,以适应Elasticsearch的特定特性,特别是它无法在不同索引之间执行连接。在波尔多大学医院的CDW内对实施情况进行了测试和评估,CDW包含250万患者的数据和30多亿次观察结果。总的来说,与关系数据库相比,Elasticsearch实现了更短的查询执行时间,在自由文本搜索方面的性能提升尤为显著。此外,与索引关系数据库(包括全文索引)相比,Elasticsearch需要更少的磁盘空间用于存储。结论:我们证明了使用Elasticsearch实现i2b2是可行的,并且在减少磁盘空间使用的同时显著提高了查询性能。该实施方案目前正在波尔多大学医院生产。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrating Health Care Data in an Informatics for Integrating Biology & the Bedside (i2b2) Model Persisted Through Elasticsearch: Design, Implementation, and Evaluation in a French University Hospital.

Background: The volume of digital data in health care is continually growing. In addition to its use in health care, the health data collected can also serve secondary purposes, such as research. In this context, clinical data warehouses (CDWs) provide the infrastructure and organization necessary to enhance the secondary use of health data. Various data models have been proposed for structuring data in a CDW, including the Informatics for Integrating Biology & the Bedside (i2b2) model, which relies on a relational database. However, this persistence approach can lead to performance issues when executing queries on massive data sets.

Objective: This study aims to describe the necessary transformations and their implementation to enable i2b2's search engine to perform the phenotyping task using data persistence in a NoSQL Elasticsearch database.

Methods: This study compares data persistence in a standard relational database with a NoSQL Elasticsearch database in terms of query response and execution performance (focusing on counting queries based on structured data, numerical data, and free text, including temporal filtering) as well as material resource requirements. Additionally, the data loading and updating processes are described.

Results: We propose adaptations to the i2b2 model to accommodate the specific features of Elasticsearch, particularly its inability to perform joins between different indexes. The implementation was tested and evaluated within the CDW of Bordeaux University Hospital, which contains data on 2.5 million patients and over 3 billion observations. Overall, Elasticsearch achieves shorter query execution times compared with a relational database, with particularly significant performance gains for free-text searches. Additionally, compared with an indexed relational database (including a full-text index), Elasticsearch requires less disk space for storage.

Conclusions: We demonstrate that implementing i2b2 with Elasticsearch is feasible and significantly improves query performance while reducing disk space usage. This implementation is currently in production at Bordeaux University Hospital.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Medical Informatics
JMIR Medical Informatics Medicine-Health Informatics
CiteScore
7.90
自引率
3.10%
发文量
173
审稿时长
12 weeks
期刊介绍: JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信