从代码共享到实施共享：通过联合测试推进医学影像领域可重现的人工智能开发

IF 1.3 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging and Radiation Sciences Pub Date : 2024-08-29 DOI:10.1016/j.jmir.2024.101745

Fereshteh Yousefirizi , Annudesh Liyanage , Ivan S. Klyuzhin , Arman Rahmim

{"title":"从代码共享到实施共享：通过联合测试推进医学影像领域可重现的人工智能开发","authors":"Fereshteh Yousefirizi , Annudesh Liyanage , Ivan S. Klyuzhin , Arman Rahmim","doi":"10.1016/j.jmir.2024.101745","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>The reproducibility crisis in AI research remains a significant concern. While code sharing has been acknowledged as a step toward addressing this issue, our focus extends beyond this paradigm. In this work, we explore “federated testing” as an avenue for advancing reproducible AI research and development especially in medical imaging. Unlike federated learning, where a model is developed and refined on data from different centers, federated testing involves models developed by one team being deployed and evaluated by others, addressing reproducibility across various implementations.</p></div><div><h3>Methods</h3><p>Our study follows an exploratory design aimed at systematically evaluating the sources of discrepancies in shared model execution for medical imaging and outputs on the same input data, independent of generalizability analysis. We distributed the same model code to multiple independent centers, monitoring execution in different runtime environments while considering various real-world scenarios for pre- and post-processing steps. We analyzed deployment infrastructure by comparing the impact of different computational resources (GPU vs. CPU) on model performance. To assess federated testing in AI models for medical imaging, we performed a comparative evaluation across different centers, each with distinct pre- and post-processing steps and deployment environments, specifically targeting AI-driven positron emission tomography (PET) imaging segmentation. More specifically, we studied federated testing for an AI-based model for surrogate total metabolic tumor volume (sTMTV) segmentation in PET imaging: the AI algorithm, trained on maximum intensity projection (MIP) data, segments lymphoma regions and estimates sTMTV.</p></div><div><h3>Results</h3><p>Our study reveals that relying solely on open-source code sharing does not guarantee reproducible results due to variations in code execution, runtime environments, and incomplete input specifications. Deploying the segmentation model on local and virtual GPUs compared to using Docker containers showed no effect on reproducibility. However, significant sources of variability were found in data preparation and pre-/post- processing techniques for PET imaging. These findings underscore the limitations of code sharing alone in achieving consistent and accurate results in federated testing.</p></div><div><h3>Conclusion</h3><p>Achieving consistently precise results in federated testing requires more than just sharing models through open-source code. Comprehensive pipeline sharing, including pre- and post-processing steps, is essential. Cloud-based platforms that automate these processes can streamline AI model testing across diverse locations. Standardizing protocols and sharing complete pipelines can significantly enhance the robustness and reproducibility of AI models.</p></div>","PeriodicalId":46420,"journal":{"name":"Journal of Medical Imaging and Radiation Sciences","volume":"55 4","pages":"Article 101745"},"PeriodicalIF":1.3000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From code sharing to sharing of implementations: Advancing reproducible AI development for medical imaging through federated testing\",\"authors\":\"Fereshteh Yousefirizi , Annudesh Liyanage , Ivan S. Klyuzhin , Arman Rahmim\",\"doi\":\"10.1016/j.jmir.2024.101745\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>The reproducibility crisis in AI research remains a significant concern. While code sharing has been acknowledged as a step toward addressing this issue, our focus extends beyond this paradigm. In this work, we explore “federated testing” as an avenue for advancing reproducible AI research and development especially in medical imaging. Unlike federated learning, where a model is developed and refined on data from different centers, federated testing involves models developed by one team being deployed and evaluated by others, addressing reproducibility across various implementations.</p></div><div><h3>Methods</h3><p>Our study follows an exploratory design aimed at systematically evaluating the sources of discrepancies in shared model execution for medical imaging and outputs on the same input data, independent of generalizability analysis. We distributed the same model code to multiple independent centers, monitoring execution in different runtime environments while considering various real-world scenarios for pre- and post-processing steps. We analyzed deployment infrastructure by comparing the impact of different computational resources (GPU vs. CPU) on model performance. To assess federated testing in AI models for medical imaging, we performed a comparative evaluation across different centers, each with distinct pre- and post-processing steps and deployment environments, specifically targeting AI-driven positron emission tomography (PET) imaging segmentation. More specifically, we studied federated testing for an AI-based model for surrogate total metabolic tumor volume (sTMTV) segmentation in PET imaging: the AI algorithm, trained on maximum intensity projection (MIP) data, segments lymphoma regions and estimates sTMTV.</p></div><div><h3>Results</h3><p>Our study reveals that relying solely on open-source code sharing does not guarantee reproducible results due to variations in code execution, runtime environments, and incomplete input specifications. Deploying the segmentation model on local and virtual GPUs compared to using Docker containers showed no effect on reproducibility. However, significant sources of variability were found in data preparation and pre-/post- processing techniques for PET imaging. These findings underscore the limitations of code sharing alone in achieving consistent and accurate results in federated testing.</p></div><div><h3>Conclusion</h3><p>Achieving consistently precise results in federated testing requires more than just sharing models through open-source code. Comprehensive pipeline sharing, including pre- and post-processing steps, is essential. Cloud-based platforms that automate these processes can streamline AI model testing across diverse locations. Standardizing protocols and sharing complete pipelines can significantly enhance the robustness and reproducibility of AI models.</p></div>\",\"PeriodicalId\":46420,\"journal\":{\"name\":\"Journal of Medical Imaging and Radiation Sciences\",\"volume\":\"55 4\",\"pages\":\"Article 101745\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging and Radiation Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1939865424004764\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging and Radiation Sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1939865424004764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

背景人工智能研究的可重复性危机仍然是一个重大问题。虽然代码共享被认为是解决这一问题的一个步骤，但我们的关注点并不局限于此。在这项工作中，我们将探索 "联合测试 "作为推进可重现人工智能研发的途径，尤其是在医学影像领域。联合测试与联合学习不同，联合学习是在不同中心的数据基础上开发和完善模型，而联合测试则是将一个团队开发的模型部署到其他团队并由其他团队进行评估，从而解决不同实施方案之间的可重复性问题。方法我们的研究采用探索性设计，旨在系统地评估医学成像共享模型执行中的差异来源，以及在相同输入数据上的输出，而不依赖于通用性分析。我们将相同的模型代码分发到多个独立中心，监测不同运行环境下的执行情况，同时考虑了前处理和后处理步骤的各种实际情况。我们通过比较不同计算资源（GPU 与 CPU）对模型性能的影响来分析部署基础设施。为了评估医学成像人工智能模型的联合测试，我们在不同的中心进行了比较评估，每个中心都有不同的前后处理步骤和部署环境，特别是针对人工智能驱动的正电子发射断层扫描（PET）成像分割。更具体地说，我们研究了基于人工智能的正电子发射计算机断层扫描成像中代用总代谢肿瘤体积（sTMTV）分割模型的联合测试：该人工智能算法在最大强度投影（MIP）数据上经过训练，可分割淋巴瘤区域并估算 sTMTV。与使用 Docker 容器相比，在本地和虚拟 GPU 上部署分割模型对可重复性没有影响。然而，在 PET 成像的数据准备和前后处理技术中发现了重大的可变性来源。这些发现凸显了仅靠代码共享无法在联合测试中获得一致和准确结果的局限性。全面的管道共享（包括前处理和后处理步骤）至关重要。基于云的平台可以自动执行这些流程，从而简化不同地点的人工智能模型测试。标准化协议和共享完整的管道可以显著提高人工智能模型的稳健性和可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From code sharing to sharing of implementations: Advancing reproducible AI development for medical imaging through federated testing

Background

The reproducibility crisis in AI research remains a significant concern. While code sharing has been acknowledged as a step toward addressing this issue, our focus extends beyond this paradigm. In this work, we explore “federated testing” as an avenue for advancing reproducible AI research and development especially in medical imaging. Unlike federated learning, where a model is developed and refined on data from different centers, federated testing involves models developed by one team being deployed and evaluated by others, addressing reproducibility across various implementations.

Methods

Our study follows an exploratory design aimed at systematically evaluating the sources of discrepancies in shared model execution for medical imaging and outputs on the same input data, independent of generalizability analysis. We distributed the same model code to multiple independent centers, monitoring execution in different runtime environments while considering various real-world scenarios for pre- and post-processing steps. We analyzed deployment infrastructure by comparing the impact of different computational resources (GPU vs. CPU) on model performance. To assess federated testing in AI models for medical imaging, we performed a comparative evaluation across different centers, each with distinct pre- and post-processing steps and deployment environments, specifically targeting AI-driven positron emission tomography (PET) imaging segmentation. More specifically, we studied federated testing for an AI-based model for surrogate total metabolic tumor volume (sTMTV) segmentation in PET imaging: the AI algorithm, trained on maximum intensity projection (MIP) data, segments lymphoma regions and estimates sTMTV.

Results

Our study reveals that relying solely on open-source code sharing does not guarantee reproducible results due to variations in code execution, runtime environments, and incomplete input specifications. Deploying the segmentation model on local and virtual GPUs compared to using Docker containers showed no effect on reproducibility. However, significant sources of variability were found in data preparation and pre-/post- processing techniques for PET imaging. These findings underscore the limitations of code sharing alone in achieving consistent and accurate results in federated testing.

Conclusion

Achieving consistently precise results in federated testing requires more than just sharing models through open-source code. Comprehensive pipeline sharing, including pre- and post-processing steps, is essential. Cloud-based platforms that automate these processes can streamline AI model testing across diverse locations. Standardizing protocols and sharing complete pipelines can significantly enhance the robustness and reproducibility of AI models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medical Imaging and Radiation Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

2.30

自引率

11.10%

发文量

231

审稿时长

53 days

期刊介绍： Journal of Medical Imaging and Radiation Sciences is the official peer-reviewed journal of the Canadian Association of Medical Radiation Technologists. This journal is published four times a year and is circulated to approximately 11,000 medical radiation technologists, libraries and radiology departments throughout Canada, the United States and overseas. The Journal publishes articles on recent research, new technology and techniques, professional practices, technologists viewpoints as well as relevant book reviews.