Enhancing Learning-Based Cross-Modality Prediction for Lossless Medical Imaging Compression

IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-04-28 DOI:10.1109/OJSP.2025.3564830

Daniel S. Nicolau;Lucas A. Thomaz;Luis M. N. Tavora;Sergio M. M. Faria

{"title":"Enhancing Learning-Based Cross-Modality Prediction for Lossless Medical Imaging Compression","authors":"Daniel S. Nicolau;Lucas A. Thomaz;Luis M. N. Tavora;Sergio M. M. Faria","doi":"10.1109/OJSP.2025.3564830","DOIUrl":null,"url":null,"abstract":"Multimodal medical imaging, which involves the simultaneous acquisition of different modalities, enhances diagnostic accuracy and provides comprehensive visualization of anatomy and physiology. However, this significantly increases data size, posing storage and transmission challenges. Standard image codecs fail to properly exploit cross-modality redundancies, limiting coding efficiency. In this paper, a novel approach is proposed to enhance the compression gain and to reduce the computational complexity of a lossless cross-modality coding scheme for multimodal image pairs. The scheme uses a deep learning-based approach with Image-to-Image translation based on a Generative Adversarial Network architecture to generate an estimated image of one modality from its cross-modal pair. Two different approaches for inter-modal prediction are considered: one using the original and the estimated images for the inter-prediction scheme and another considering a weighted sum of both images. Subsequently, a decider based on a Convolutional Neural Network is employed to estimate the best coding approach to be selected among the two alternatives, before the coding step. A novel loss function that considers the decision accuracy and the compression gain of the chosen prediction approach is applied to improve the decision-making task. The experimental results on PET-CT and PET-MRI datasets demonstrate that the proposed approach improves by 11.76% and 4.61% the compression efficiency when compared with the single modality intra-coding of the Versatile Video Coding. Additionally, this approach allows to reduce the computational complexity by almost half in comparison to selecting the most compression-efficient after testing both schemes.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"489-497"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978054","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10978054/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal medical imaging, which involves the simultaneous acquisition of different modalities, enhances diagnostic accuracy and provides comprehensive visualization of anatomy and physiology. However, this significantly increases data size, posing storage and transmission challenges. Standard image codecs fail to properly exploit cross-modality redundancies, limiting coding efficiency. In this paper, a novel approach is proposed to enhance the compression gain and to reduce the computational complexity of a lossless cross-modality coding scheme for multimodal image pairs. The scheme uses a deep learning-based approach with Image-to-Image translation based on a Generative Adversarial Network architecture to generate an estimated image of one modality from its cross-modal pair. Two different approaches for inter-modal prediction are considered: one using the original and the estimated images for the inter-prediction scheme and another considering a weighted sum of both images. Subsequently, a decider based on a Convolutional Neural Network is employed to estimate the best coding approach to be selected among the two alternatives, before the coding step. A novel loss function that considers the decision accuracy and the compression gain of the chosen prediction approach is applied to improve the decision-making task. The experimental results on PET-CT and PET-MRI datasets demonstrate that the proposed approach improves by 11.76% and 4.61% the compression efficiency when compared with the single modality intra-coding of the Versatile Video Coding. Additionally, this approach allows to reduce the computational complexity by almost half in comparison to selecting the most compression-efficient after testing both schemes.

查看原文本刊更多论文

增强基于学习的医学图像无损压缩交叉模态预测

多模态医学成像，包括同时获取不同的模态，提高了诊断的准确性，并提供了解剖学和生理学的全面可视化。然而，这大大增加了数据大小，带来了存储和传输方面的挑战。标准的图像编解码器不能很好地利用跨模态冗余，限制了编码效率。本文提出了一种新的方法来提高多模态图像对的无损交叉模态编码的压缩增益并降低其计算复杂度。该方案使用基于生成对抗网络架构的基于深度学习的图像到图像转换方法，从其跨模态对中生成一种模态的估计图像。考虑了两种不同的模式间预测方法：一种是使用原始图像和估计图像进行模式间预测，另一种是考虑两个图像的加权和。然后，在编码步骤之前，使用基于卷积神经网络的决策器来估计要在两个备选方案中选择的最佳编码方法。提出了一种新的损失函数，考虑了所选预测方法的决策精度和压缩增益，以改善决策任务。在PET-CT和PET-MRI数据集上的实验结果表明，与通用视频编码的单模态内编码相比，该方法的压缩效率分别提高了11.76%和4.61%。此外，与在测试两种方案后选择压缩效率最高的方案相比，这种方法可以将计算复杂度降低近一半。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊