DII-FRSA: Diverse image inpainting with multi-scale feature representation and separable attention

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-05-17 DOI:10.1016/j.jvcir.2025.104472

Jixiang Cheng, Yuan Wu, Zhidan Li, Yiluo Zhang

{"title":"DII-FRSA: Diverse image inpainting with multi-scale feature representation and separable attention","authors":"Jixiang Cheng, Yuan Wu, Zhidan Li, Yiluo Zhang","doi":"10.1016/j.jvcir.2025.104472","DOIUrl":null,"url":null,"abstract":"<div><div>Diverse image inpainting is the process of generating multiple visually realistic completion results. Although previous methods in this area have seen success, they still exhibit some limitations. First, one-stage approaches must make a trade-off between diversity and consistency. Second, while two-stage approaches can overcome such problems, they require autoregressive models to estimate the probability distribution of the structural priors, which has a significant impact on inference speed. This paper introduces DII-FRSA, a method for diverse image inpainting utilizing multi-scale feature representation and separable attention. In the first stage, we build a Gaussian distribution from the dataset to sample multiple coarse results. To enhance the modeling capability of the Variational Auto-Encoder, we propose a multi-scale feature representation module for the encoder and decoder. In the second stage, the coarse results are refined while maintaining overall consistency of appearance. Additionally, we design a refinement network based on the proposed separable attention to further improve the quality of the coarse results and maintain consistency in the appearance of the visible and masked regions. Our method was tested on well-established datasets-Places2, CelebA-HQ, and Paris Street View, and outperformed modern techniques. Our network not only enhances the diversity of the completed results but also enhances their visual realism.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104472"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000860","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Diverse image inpainting is the process of generating multiple visually realistic completion results. Although previous methods in this area have seen success, they still exhibit some limitations. First, one-stage approaches must make a trade-off between diversity and consistency. Second, while two-stage approaches can overcome such problems, they require autoregressive models to estimate the probability distribution of the structural priors, which has a significant impact on inference speed. This paper introduces DII-FRSA, a method for diverse image inpainting utilizing multi-scale feature representation and separable attention. In the first stage, we build a Gaussian distribution from the dataset to sample multiple coarse results. To enhance the modeling capability of the Variational Auto-Encoder, we propose a multi-scale feature representation module for the encoder and decoder. In the second stage, the coarse results are refined while maintaining overall consistency of appearance. Additionally, we design a refinement network based on the proposed separable attention to further improve the quality of the coarse results and maintain consistency in the appearance of the visible and masked regions. Our method was tested on well-established datasets-Places2, CelebA-HQ, and Paris Street View, and outperformed modern techniques. Our network not only enhances the diversity of the completed results but also enhances their visual realism.

查看原文本刊更多论文

DII-FRSA：多尺度特征表示和可分离注意的多元图像绘制

绘画中的多元图像是生成多个视觉逼真的完成结果的过程。虽然以前在这方面的方法已经取得了成功，但它们仍然表现出一些局限性。首先，单阶段方法必须在多样性和一致性之间做出权衡。其次，虽然两阶段方法可以克服这些问题，但它们需要自回归模型来估计结构先验的概率分布，这对推理速度有很大影响。本文介绍了一种基于多尺度特征表示和可分离注意的多元图像绘制方法——DII-FRSA。在第一阶段，我们从数据集构建一个高斯分布来采样多个粗糙结果。为了提高变分自编码器的建模能力，提出了一种多尺度特征表示模块。在第二阶段，对粗糙的结果进行细化，同时保持外观的整体一致性。此外，我们设计了一个基于所提出的可分离注意力的细化网络，以进一步提高粗糙结果的质量，并保持可见区域和遮罩区域外观的一致性。我们的方法在places2、CelebA-HQ和巴黎街景等成熟的数据集上进行了测试，结果优于现代技术。我们的网络不仅增强了完成结果的多样性，而且增强了它们的视觉真实感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.