肿瘤学人工智能数据标准的新实施：来自EuCanImage项目的经验。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2025-01-06 DOI:10.1093/gigascience/giae101

Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla

{"title":"肿瘤学人工智能数据标准的新实施：来自EuCanImage项目的经验。","authors":"Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla","doi":"10.1093/gigascience/giae101","DOIUrl":null,"url":null,"abstract":"Background: An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models.Technical details: The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps.Conclusion: This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12071370/pdf/","citationCount":"0","resultStr":"{\"title\":\"New implementation of data standards for AI in oncology: Experience from the EuCanImage project.\",\"authors\":\"Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla\",\"doi\":\"10.1093/gigascience/giae101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models.Technical details: The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps.Conclusion: This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12071370/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giae101\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae101","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

背景：世界各地的卫生保健机构产生了前所未有的个人健康数据，具有彻底改变精准医疗的潜力。使用人工智能（AI）对这些数据的利用依赖于将异构、多中心、多模式和多参数数据结合起来的能力，以及对知识和数据可用性的深思熟虑的表示。尽管有这些可能性，重大的方法挑战和伦理法律约束仍然阻碍了数据模型在现实世界中的实现。技术细节：EuCanImage是一个国际联盟，旨在开发用于肿瘤精准医学的人工智能算法，并在必要的伦理批准的基础上实现数据的二次使用。使用定义良好的临床数据标准来实现互操作性是该计划的核心要素。该联盟专注于3种不同的癌症类型，并解决7个未满足的临床需求。我们构思并实施了一个创新流程，从医院捕获临床数据，将其转换为新开发的EuCanImage数据模型，然后将标准化数据存储在永久存储库中。这个新的工作流程结合了公认的软件（用于数据捕获的REDCap）、数据标准（用于数据结构的FHIR）和现有的存储库（用于永久数据存储和共享的EGA），以及新开发的用于数据转换和质量控制目的的定制工具（ETL管道、QC脚本），以弥补差距。结论：本文综合了我们在医疗保健数据互操作性、标准化和可重复性方面的经验和流程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

New implementation of data standards for AI in oncology: Experience from the EuCanImage project.

Background: An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models.

Technical details: The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps.

Conclusion: This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.