Discrete codebook collaborating with transformer for thangka image inpainting

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-08-07 DOI:10.1007/s00530-024-01439-0

Jinxian Bai, Yao Fan, Zhiwei Zhao

{"title":"Discrete codebook collaborating with transformer for thangka image inpainting","authors":"Jinxian Bai, Yao Fan, Zhiwei Zhao","doi":"10.1007/s00530-024-01439-0","DOIUrl":null,"url":null,"abstract":"<p>Thangka, as a precious heritage of painting art, holds irreplaceable research value due to its richness in Tibetan history, religious beliefs, and folk culture. However, it is susceptible to partial damage and form distortion due to natural erosion or inadequate conservation measures. Given the complexity of textures and rich semantics in thangka images, existing image inpainting methods struggle to recover their original artistic style and intricate details. In this paper, we propose a novel approach combining discrete codebook learning with a transformer for image inpainting, tailored specifically for thangka images. In the codebook learning stage, we design an improved network framework based on vector quantization (VQ) codebooks to discretely encode intermediate features of input images, yielding a context-rich discrete codebook. The second phase introduces a parallel transformer module based on a cross-shaped window, which efficiently predicts the index combinations for missing regions under limited computational cost. Furthermore, we devise a multi-scale feature guidance module that progressively fuses features from intact areas with textural features from the codebook, thereby enhancing the preservation of local details in non-damaged regions. We validate the efficacy of our method through qualitative and quantitative experiments on datasets including Celeba-HQ, Places2, and a custom thangka dataset. Experimental results demonstrate that compared to previous methods, our approach successfully reconstructs images with more complete structural information and clearer textural details.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"167 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01439-0","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Thangka, as a precious heritage of painting art, holds irreplaceable research value due to its richness in Tibetan history, religious beliefs, and folk culture. However, it is susceptible to partial damage and form distortion due to natural erosion or inadequate conservation measures. Given the complexity of textures and rich semantics in thangka images, existing image inpainting methods struggle to recover their original artistic style and intricate details. In this paper, we propose a novel approach combining discrete codebook learning with a transformer for image inpainting, tailored specifically for thangka images. In the codebook learning stage, we design an improved network framework based on vector quantization (VQ) codebooks to discretely encode intermediate features of input images, yielding a context-rich discrete codebook. The second phase introduces a parallel transformer module based on a cross-shaped window, which efficiently predicts the index combinations for missing regions under limited computational cost. Furthermore, we devise a multi-scale feature guidance module that progressively fuses features from intact areas with textural features from the codebook, thereby enhancing the preservation of local details in non-damaged regions. We validate the efficacy of our method through qualitative and quantitative experiments on datasets including Celeba-HQ, Places2, and a custom thangka dataset. Experimental results demonstrate that compared to previous methods, our approach successfully reconstructs images with more complete structural information and clearer textural details.

Abstract Image

查看原文本刊更多论文

用于唐卡图像绘制的离散编码本与变换器协作

唐卡作为绘画艺术的珍贵遗产，因其蕴含着丰富的藏族历史、宗教信仰和民俗文化，具有不可替代的研究价值。然而，由于自然侵蚀或保护措施不当，很容易造成局部损坏和形态变形。鉴于唐卡图像纹理的复杂性和丰富的语义，现有的图像上色方法很难恢复其原有的艺术风格和复杂细节。在本文中，我们提出了一种新方法，将离散代码集学习与图像着色变换器相结合，专门用于唐卡图像。在编码本学习阶段，我们设计了一个基于向量量化（VQ）编码本的改进网络框架，对输入图像的中间特征进行离散编码，从而生成一个上下文丰富的离散编码本。第二阶段引入了基于十字形窗口的并行变换器模块，它能在有限的计算成本下高效预测缺失区域的索引组合。此外，我们还设计了一个多尺度特征引导模块，将完整区域的特征与代码库中的纹理特征逐步融合，从而加强对非损坏区域局部细节的保护。我们在 Celeba-HQ、Places2 和自定义唐卡数据集等数据集上进行了定性和定量实验，验证了我们方法的有效性。实验结果表明，与之前的方法相比，我们的方法成功地重建了具有更完整结构信息和更清晰纹理细节的图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.