Public data homogenization for AI model development in breast cancer

IF 3.7 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Vassilis Kilintzis, Varvara Kalokyri, Haridimos Kondylakis, Smriti Joshi, Katerina Nikiforaki, Oliver Díaz, Karim Lekadir, Manolis Tsiknakis, Kostas Marias
{"title":"Public data homogenization for AI model development in breast cancer","authors":"Vassilis Kilintzis, Varvara Kalokyri, Haridimos Kondylakis, Smriti Joshi, Katerina Nikiforaki, Oliver Díaz, Karim Lekadir, Manolis Tsiknakis, Kostas Marias","doi":"10.1186/s41747-024-00442-4","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Developing trustworthy artificial intelligence (AI) models for clinical applications requires access to clinical and imaging data cohorts. Reusing of publicly available datasets has the potential to fill this gap. Specifically in the domain of breast cancer, a large archive of publicly accessible medical images along with the corresponding clinical data is available at The Cancer Imaging Archive (TCIA). However, existing datasets cannot be directly used as they are heterogeneous and cannot be effectively filtered for selecting specific image types required to develop AI models. This work focuses on the development of a homogenized dataset in the domain of breast cancer including clinical and imaging data.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Five datasets were acquired from the TCIA and were harmonized. For the clinical data harmonization, a common data model was developed and a repeatable, documented “extract-transform-load” process was defined and executed for their homogenization. Further, Digital Imaging and COmmunications in Medicine (DICOM) information was extracted from magnetic resonance imaging (MRI) data and made accessible and searchable.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>The resulting harmonized dataset includes information about 2,035 subjects with breast cancer. Further, a platform named RV-Cherry-Picker enables search over both the clinical and diagnostic imaging datasets, providing unified access, facilitating the downloading of all study imaging that correspond to specific series’ characteristics (<i>e.g.</i>, dynamic contrast-enhanced series), and reducing the burden of acquiring the appropriate set of images for the respective AI model scenario.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>RV-Cherry-Picker provides access to the largest, publicly available, homogenized, imaging/clinical dataset for breast cancer to develop AI models on top.</p><h3 data-test=\"abstract-sub-heading\">Relevance statement</h3><p>We present a solution for creating merged public datasets supporting AI model development, using as an example the breast cancer domain and magnetic resonance imaging images.</p><h3 data-test=\"abstract-sub-heading\">Key points</h3><p>• The proposed platform allows unified access to the largest, homogenized public imaging dataset for breast cancer.</p><p>• A methodology for the semantically enriched homogenization of public clinical data is presented.</p><p>• The platform is able to make a detailed selection of breast MRI data for the development of AI models.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>\n","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-024-00442-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Developing trustworthy artificial intelligence (AI) models for clinical applications requires access to clinical and imaging data cohorts. Reusing of publicly available datasets has the potential to fill this gap. Specifically in the domain of breast cancer, a large archive of publicly accessible medical images along with the corresponding clinical data is available at The Cancer Imaging Archive (TCIA). However, existing datasets cannot be directly used as they are heterogeneous and cannot be effectively filtered for selecting specific image types required to develop AI models. This work focuses on the development of a homogenized dataset in the domain of breast cancer including clinical and imaging data.

Methods

Five datasets were acquired from the TCIA and were harmonized. For the clinical data harmonization, a common data model was developed and a repeatable, documented “extract-transform-load” process was defined and executed for their homogenization. Further, Digital Imaging and COmmunications in Medicine (DICOM) information was extracted from magnetic resonance imaging (MRI) data and made accessible and searchable.

Results

The resulting harmonized dataset includes information about 2,035 subjects with breast cancer. Further, a platform named RV-Cherry-Picker enables search over both the clinical and diagnostic imaging datasets, providing unified access, facilitating the downloading of all study imaging that correspond to specific series’ characteristics (e.g., dynamic contrast-enhanced series), and reducing the burden of acquiring the appropriate set of images for the respective AI model scenario.

Conclusions

RV-Cherry-Picker provides access to the largest, publicly available, homogenized, imaging/clinical dataset for breast cancer to develop AI models on top.

Relevance statement

We present a solution for creating merged public datasets supporting AI model development, using as an example the breast cancer domain and magnetic resonance imaging images.

Key points

• The proposed platform allows unified access to the largest, homogenized public imaging dataset for breast cancer.

• A methodology for the semantically enriched homogenization of public clinical data is presented.

• The platform is able to make a detailed selection of breast MRI data for the development of AI models.

Graphical Abstract

Abstract Image

乳腺癌人工智能模型开发的公共数据同质化
背景为临床应用开发可信的人工智能(AI)模型需要访问临床和成像数据队列。重复使用公开可用的数据集有可能填补这一空白。具体到乳腺癌领域,癌症成像档案馆(TCIA)提供了大量可公开访问的医学影像和相应的临床数据。然而,现有的数据集无法直接使用,因为它们是异构的,无法有效地筛选出开发人工智能模型所需的特定图像类型。这项工作的重点是开发乳腺癌领域的同质化数据集,包括临床和成像数据。为统一临床数据,开发了一个通用数据模型,并定义和执行了一个可重复的、记录在案的 "提取-转换-加载 "流程,以实现数据的同质化。此外,还从磁共振成像(MRI)数据中提取了数字成像和医学通信(DICOM)信息,并使其可访问和搜索。此外,一个名为 RV-Cherry-Picker 的平台可在临床和诊断成像数据集上进行搜索,提供统一的访问,方便下载符合特定系列特征的所有研究成像(例如,动态对比增强系列)、结论RV-Cherry-Picker 提供了对最大的、公开可用的、同质化的乳腺癌成像/临床数据集的访问,以便在此基础上开发人工智能模型。 相关性声明我们以乳腺癌领域和磁共振成像图像为例,介绍了一种创建支持人工智能模型开发的合并公共数据集的解决方案。要点--所提出的平台允许统一访问最大的、同质化的乳腺癌公共成像数据集--提出了一种对公共临床数据进行语义丰富的同质化的方法--该平台能够为开发人工智能模型详细选择乳腺磁共振成像数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Radiology Experimental
European Radiology Experimental Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
6.70
自引率
2.60%
发文量
56
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信