I-ETL: an interoperability-aware health (meta)data pipeline to enable federated analyses.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-10-13 DOI:10.1186/s12911-025-03188-0

Nelly Barret, Anna Bernasconi, Boris Bikbov, Pietro Pinoli

{"title":"I-ETL: an interoperability-aware health (meta)data pipeline to enable federated analyses.","authors":"Nelly Barret, Anna Bernasconi, Boris Bikbov, Pietro Pinoli","doi":"10.1186/s12911-025-03188-0","DOIUrl":null,"url":null,"abstract":"Background: Clinicians are interested in better understanding complex diseases, such as cancer or rare diseases, so they need to produce and exchange data to mutualize sources and join forces. To do so and ensure privacy, a natural way consists in using a decentralized architecture and Federated Learning algorithms. This ensures that data stays in the organization in which it has been collected, but requires data to be collected in similar settings and similar models. In practice, this is often not the case because healthcare institutions work individually with different representations and raw data; they do not have means to normalize their data, and even less to do so across centers. For instance, clinicians have at hand phenotypic, clinical, imaging and genomic data (each individually collected) and want to better understand some diseases by analyzing them together. This example highlights the needs and challenges for a cooperative use of this wealth of information.Methods: We designed and implemented a framework, named I-ETL, for integrating highly heterogeneous healthcare datasets of hospitals in interoperable databases. Our proposal is twofold: ([Formula: see text]) we devise two general and extensible conceptual models for modeling both data and metadata and ([Formula: see text]) we propose an Extract-Transform-Load (ETL) pipeline ensuring and assessing interoperability from the start.Results: By conducting experiments on open-source datasets, we show that I-ETL succeeds in representing various health datasets in a unified way thanks to our two general conceptual models. Next, we demonstrate the importance of blending interoperability as a first-class citizen in integration pipelines, ensuring possible collaboration between different centers.Conclusion: As a framework, I-ETL contributes to integrate and improve interoperability between healthcare institutions. When used in a decentralized federated platform, it eases the federated analysis of the different hospital databases and helps clinicians to obtain insights and knowledge on medical conditions of interest.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"375"},"PeriodicalIF":3.8000,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03188-0","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Clinicians are interested in better understanding complex diseases, such as cancer or rare diseases, so they need to produce and exchange data to mutualize sources and join forces. To do so and ensure privacy, a natural way consists in using a decentralized architecture and Federated Learning algorithms. This ensures that data stays in the organization in which it has been collected, but requires data to be collected in similar settings and similar models. In practice, this is often not the case because healthcare institutions work individually with different representations and raw data; they do not have means to normalize their data, and even less to do so across centers. For instance, clinicians have at hand phenotypic, clinical, imaging and genomic data (each individually collected) and want to better understand some diseases by analyzing them together. This example highlights the needs and challenges for a cooperative use of this wealth of information.

Methods: We designed and implemented a framework, named I-ETL, for integrating highly heterogeneous healthcare datasets of hospitals in interoperable databases. Our proposal is twofold: ([Formula: see text]) we devise two general and extensible conceptual models for modeling both data and metadata and ([Formula: see text]) we propose an Extract-Transform-Load (ETL) pipeline ensuring and assessing interoperability from the start.

Results: By conducting experiments on open-source datasets, we show that I-ETL succeeds in representing various health datasets in a unified way thanks to our two general conceptual models. Next, we demonstrate the importance of blending interoperability as a first-class citizen in integration pipelines, ensuring possible collaboration between different centers.

Conclusion: As a framework, I-ETL contributes to integrate and improve interoperability between healthcare institutions. When used in a decentralized federated platform, it eases the federated analysis of the different hospital databases and helps clinicians to obtain insights and knowledge on medical conditions of interest.

查看原文本刊更多论文

I-ETL：支持联合分析的可互操作性健康（元）数据管道。

背景：临床医生对更好地了解复杂疾病（如癌症或罕见疾病）感兴趣，因此他们需要生成和交换数据以实现资源共享和联合力量。要做到这一点并确保隐私，自然的方法是使用分散的架构和联邦学习算法。这样可以确保数据保留在收集数据的组织中，但需要在类似的设置和类似的模型中收集数据。在实践中，情况往往并非如此，因为医疗机构各自使用不同的表示和原始数据；他们没有办法标准化他们的数据，更不用说跨中心这样做了。例如，临床医生手头有表型、临床、成像和基因组数据（每个数据都是单独收集的），他们希望通过综合分析来更好地了解一些疾病。这个例子突出了合作使用这些丰富信息的需求和挑战。方法：我们设计并实现了一个名为I-ETL的框架，用于将高度异构的医院医疗数据集集成到可互操作的数据库中。我们的建议是双重的：（[公式：见文本]）我们设计了两个通用的和可扩展的概念模型，用于对数据和元数据建模；（[公式：见文本]）我们提出了一个提取-转换-加载（ETL）管道，从一开始就确保和评估互操作性。结果：通过在开源数据集上进行实验，我们发现由于我们的两个通用概念模型，I-ETL成功地以统一的方式表示了各种健康数据集。接下来，我们将演示将互操作性作为集成管道中的一等公民进行混合的重要性，以确保不同中心之间可能的协作。结论：作为一个框架，I-ETL有助于整合和改善医疗机构之间的互操作性。在分散的联合平台中使用时，它简化了对不同医院数据库的联合分析，并帮助临床医生获得有关感兴趣的医疗条件的见解和知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.