TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-04-29 DOI:10.1109/JETCAS.2024.3394495

Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen

{"title":"TM-GAN: A Transformer-Based Multi-Modal Generative Adversarial Network for Guided Depth Image Super-Resolution","authors":"Jiang Zhu;Van Kwan Zhi Koh;Zhiping Lin;Bihan Wen","doi":"10.1109/JETCAS.2024.3394495","DOIUrl":null,"url":null,"abstract":"Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"261-274"},"PeriodicalIF":3.7000,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10509697/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Despite significant strides in deep single image super-resolution (SISR), the development of robust guided depth image super-resolution (GDSR) techniques presents a notable challenge. Effective GDSR methods must not only exploit the properties of the target image but also integrate complementary information from the guidance image. The state-of-the-art in guided image super-resolution has been dominated by convolutional neural network (CNN) based methods, which leverage CNN as their architecture. However, CNN has limitations in capturing global information effectively, and their traditional regression training techniques can sometimes lead to challenges in the precise generating of high-frequency details, unlike transformers that have shown remarkable success in deep learning through the self-attention mechanism. Drawing inspiration from the transformative impact of transformers in both language and vision applications, we propose a Transformer-based Multi-modal Generative Adversarial Network dubbed TM-GAN. TM-GAN is designed to effectively process and integrate multi-modal data, leveraging the global contextual understanding and detailed feature extraction capabilities of transformers within a GAN architecture for GDSR, aiming to effectively integrate and utilize multi-modal data sources. Experimental evaluations of TM-GAN on a variety of RGB-D datasets demonstrate its superiority over the state-of-the-art methods, showcasing its effectiveness in leveraging transformer-based techniques for GDSR.

查看原文本刊更多论文

TM-GAN：用于深度图像超分辨率的基于变换器的多模态生成对抗网络

尽管在深度单图像超分辨率（SISR）方面取得了长足进步，但开发稳健的引导深度图像超分辨率（GDSR）技术仍是一项重大挑战。有效的 GDSR 方法不仅要利用目标图像的特性，还要整合引导图像的补充信息。在引导图像超分辨率领域，基于卷积神经网络（CNN）的方法一直处于领先地位，这些方法利用 CNN 作为其架构。然而，CNN 在有效捕捉全局信息方面存在局限性，其传统的回归训练技术有时会导致在精确生成高频细节方面遇到挑战，而变换器则不同，它通过自我注意机制在深度学习方面取得了显著的成功。从变换器在语言和视觉应用中的变革性影响中汲取灵感，我们提出了一种基于变换器的多模态生成对抗网络（TM-GAN）。TM-GAN 设计用于有效处理和整合多模态数据，在 GAN 架构内利用变换器的全局上下文理解和详细特征提取能力来实现 GDSR，旨在有效整合和利用多模态数据源。TM-GAN 在各种 RGB-D 数据集上的实验评估表明，它优于最先进的方法，展示了它在利用基于变换器的技术进行 GDSR 方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.50

自引率

2.20%

发文量

期刊介绍： The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.