嵌套可变形多头面部图像绘制注意事项

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI:10.1109/WACV56688.2023.00602

Shruti S. Phutke, S. Murala

{"title":"嵌套可变形多头面部图像绘制注意事项","authors":"Shruti S. Phutke, S. Murala","doi":"10.1109/WACV56688.2023.00602","DOIUrl":null,"url":null,"abstract":"Extracting adequate contextual information is an important aspect of any image inpainting method. To achieve this, ample image inpainting methods are available that aim to focus on large receptive fields. Recent advancements in the deep learning field with the introduction of transformers for image inpainting paved the way toward plausible results. Stacking multiple transformer blocks in a single layer causes the architecture to become computationally complex. In this context, we propose a novel lightweight architecture with a nested deformable attention-based transformer layer for feature fusion. The nested attention helps the network to focus on long-term dependencies from encoder and decoder features. Also, multi-head attention consisting of a deformable convolution is proposed to delve into the diverse receptive fields. With the advantage of nested and deformable attention, we propose a lightweight architecture for facial image inpainting. The results comparison on Celeb HQ [25] dataset using known (NVIDIA) and unknown (QD-IMD) masks and Places2 [57] dataset with NVIDIA masks along with extensive ablation study prove the superiority of the proposed approach for image inpainting tasks. The code is available at: https://github.com/shrutiphutke/NDMA_Facial_Inpainting.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Nested Deformable Multi-head Attention for Facial Image Inpainting\",\"authors\":\"Shruti S. Phutke, S. Murala\",\"doi\":\"10.1109/WACV56688.2023.00602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting adequate contextual information is an important aspect of any image inpainting method. To achieve this, ample image inpainting methods are available that aim to focus on large receptive fields. Recent advancements in the deep learning field with the introduction of transformers for image inpainting paved the way toward plausible results. Stacking multiple transformer blocks in a single layer causes the architecture to become computationally complex. In this context, we propose a novel lightweight architecture with a nested deformable attention-based transformer layer for feature fusion. The nested attention helps the network to focus on long-term dependencies from encoder and decoder features. Also, multi-head attention consisting of a deformable convolution is proposed to delve into the diverse receptive fields. With the advantage of nested and deformable attention, we propose a lightweight architecture for facial image inpainting. The results comparison on Celeb HQ [25] dataset using known (NVIDIA) and unknown (QD-IMD) masks and Places2 [57] dataset with NVIDIA masks along with extensive ablation study prove the superiority of the proposed approach for image inpainting tasks. The code is available at: https://github.com/shrutiphutke/NDMA_Facial_Inpainting.\",\"PeriodicalId\":270631,\"journal\":{\"name\":\"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WACV56688.2023.00602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

提取足够的上下文信息是任何图像绘制方法的一个重要方面。为了实现这一目标，有大量的图像绘制方法可用于专注于大的接受野。最近深度学习领域的进步，引入了用于图像绘制的变压器，为看似合理的结果铺平了道路。将多个变压器块堆叠在一个层中会导致架构变得计算复杂。在这种情况下，我们提出了一种新的轻量级架构，该架构具有嵌套的可变形的基于注意力的变压器层，用于特征融合。嵌套的注意力帮助网络关注编码器和解码器特性之间的长期依赖关系。此外，还提出了由可变形卷积组成的多头注意来深入研究不同的接受域。利用注意力的嵌套性和可变形性，提出了一种轻量级的面部图像绘制架构。Celeb HQ[25]数据集使用已知(NVIDIA)和未知(QD-IMD)掩模和Places2[57]数据集使用NVIDIA掩模以及广泛的消融研究的结果比较证明了所提出的方法在图像绘制任务中的优越性。代码可从https://github.com/shrutiphutke/NDMA_Facial_Inpainting获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Nested Deformable Multi-head Attention for Facial Image Inpainting

Extracting adequate contextual information is an important aspect of any image inpainting method. To achieve this, ample image inpainting methods are available that aim to focus on large receptive fields. Recent advancements in the deep learning field with the introduction of transformers for image inpainting paved the way toward plausible results. Stacking multiple transformer blocks in a single layer causes the architecture to become computationally complex. In this context, we propose a novel lightweight architecture with a nested deformable attention-based transformer layer for feature fusion. The nested attention helps the network to focus on long-term dependencies from encoder and decoder features. Also, multi-head attention consisting of a deformable convolution is proposed to delve into the diverse receptive fields. With the advantage of nested and deformable attention, we propose a lightweight architecture for facial image inpainting. The results comparison on Celeb HQ [25] dataset using known (NVIDIA) and unknown (QD-IMD) masks and Places2 [57] dataset with NVIDIA masks along with extensive ablation study prove the superiority of the proposed approach for image inpainting tasks. The code is available at: https://github.com/shrutiphutke/NDMA_Facial_Inpainting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量