Dual stage semantic information based generative adversarial network for image super-resolution

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-11-11 DOI:10.1016/j.cviu.2024.104226

Shailza Sharma , Abhinav Dhall , Shikhar Johri , Vinay Kumar , Vivek Singh

{"title":"Dual stage semantic information based generative adversarial network for image super-resolution","authors":"Shailza Sharma , Abhinav Dhall , Shikhar Johri , Vinay Kumar , Vivek Singh","doi":"10.1016/j.cviu.2024.104226","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning has revolutionized image super-resolution, yet challenges persist in preserving intricate details and avoiding overly smooth reconstructions. In this work, we introduce a novel architecture, the Residue and Semantic Feature-based Dual Subpixel Generative Adversarial Network (RSF-DSGAN), which emphasizes the critical role of semantic information in addressing these issues. The proposed generator architecture is designed with two sequential stages: the Premier Residual Stage and the Deuxième Residual Stage. These stages are concatenated to form a dual-stage upsampling process, substantially augmenting the model’s capacity for feature learning. A central innovation of our approach is the integration of semantic information directly into the generator. Specifically, feature maps derived from a pre-trained network are fused with the primary feature maps of the first stage, enriching the generator with high-level contextual cues. This semantic infusion enhances the fidelity and sharpness of reconstructed images, particularly in preserving object details and textures. Inter- and intra-residual connections are employed within these stages to maintain high-frequency details and fine textures. Additionally, spectral normalization is introduced in the discriminator to stabilize training. Comprehensive evaluations, including visual perception and mean opinion scores, demonstrate that RSF-DSGAN, with its emphasis on semantic information, outperforms current state-of-the-art super-resolution methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104226"},"PeriodicalIF":4.3000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224003072","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning has revolutionized image super-resolution, yet challenges persist in preserving intricate details and avoiding overly smooth reconstructions. In this work, we introduce a novel architecture, the Residue and Semantic Feature-based Dual Subpixel Generative Adversarial Network (RSF-DSGAN), which emphasizes the critical role of semantic information in addressing these issues. The proposed generator architecture is designed with two sequential stages: the Premier Residual Stage and the Deuxième Residual Stage. These stages are concatenated to form a dual-stage upsampling process, substantially augmenting the model’s capacity for feature learning. A central innovation of our approach is the integration of semantic information directly into the generator. Specifically, feature maps derived from a pre-trained network are fused with the primary feature maps of the first stage, enriching the generator with high-level contextual cues. This semantic infusion enhances the fidelity and sharpness of reconstructed images, particularly in preserving object details and textures. Inter- and intra-residual connections are employed within these stages to maintain high-frequency details and fine textures. Additionally, spectral normalization is introduced in the discriminator to stabilize training. Comprehensive evaluations, including visual perception and mean opinion scores, demonstrate that RSF-DSGAN, with its emphasis on semantic information, outperforms current state-of-the-art super-resolution methods.

查看原文本刊更多论文

基于生成式对抗网络的双阶段语义信息图像超分辨率

深度学习为图像超分辨率带来了革命性的变化，但在保留复杂细节和避免过于平滑的重建方面仍然存在挑战。在这项工作中，我们介绍了一种新颖的架构--基于残差和语义特征的双子像素生成对抗网络（RSF-DSGAN），它强调了语义信息在解决这些问题中的关键作用。拟议的生成器架构设计有两个连续的阶段：第一残差阶段和第二残差阶段。这两个阶段连接起来形成一个双阶段的上采样过程，大大增强了模型的特征学习能力。我们方法的核心创新点是将语义信息直接整合到生成器中。具体来说，从预训练网络中提取的特征图与第一阶段的主要特征图融合在一起，通过高级上下文线索丰富生成器。这种语义注入可提高重建图像的保真度和清晰度，尤其是在保留物体细节和纹理方面。在这些阶段中，采用了残基间和残基内的连接，以保持高频细节和精细纹理。此外，还在判别器中引入了光谱归一化以稳定训练。包括视觉感知和平均意见分数在内的综合评估结果表明，RSF-DSGAN 重视语义信息，其效果优于目前最先进的超分辨率方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems