IEEE transactions on pattern analysis and machine intelligence最新文献

筛选
英文 中文
TCFormer: Visual Recognition via Token Clustering Transformer. TCFormer:通过令牌聚类进行视觉识别变换器
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-11 DOI: 10.1109/TPAMI.2024.3425768
Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
{"title":"TCFormer: Visual Recognition via Token Clustering Transformer.","authors":"Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang","doi":"10.1109/TPAMI.2024.3425768","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3425768","url":null,"abstract":"<p><p>Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Transformer (TCFormer), which generates dynamic vision tokens based on semantic meaning. Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens. Through extensive experimentation across various applications, including image classification, human pose estimation, semantic segmentation, and object detection, we demonstrate the effectiveness of our TCFormer. The code and models for this work are available at https://github.com/zengwang430521/TCFormer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effects of experiment duration and supertrial analysis on EEG classification methods. 实验持续时间和超时空分析对脑电图分类方法的影响。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-10 DOI: 10.1109/TPAMI.2024.3426296
Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Joseph Schmidt, Mubarak Shah
{"title":"The effects of experiment duration and supertrial analysis on EEG classification methods.","authors":"Simone Palazzo, Concetto Spampinato, Isaak Kavasidis, Daniela Giordano, Joseph Schmidt, Mubarak Shah","doi":"10.1109/TPAMI.2024.3426296","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3426296","url":null,"abstract":"<p><p>Bharadwaj et al. [1] present a comments paper evaluating the classification accuracy of several state-of-the-art methods using EEG data averaged over random class samples. According to the results, some of the methods achieve above-chance accuracy, while the method proposed in [2], that is the target of their analysis, does not. In this rebuttal, we address these claims and explain why they are not grounded in the cognitive neuroscience literature, and why the evaluation procedure is ineffective and unfair.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141581864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference. 用双重提示提问:具有答案感知和区域参考功能的可视化问题生成。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-09 DOI: 10.1109/TPAMI.2024.3425222
Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Bo Long, Yueting Zhuang, Jian Pei
{"title":"Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference.","authors":"Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Bo Long, Yueting Zhuang, Jian Pei","doi":"10.1109/TPAMI.2024.3425222","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3425222","url":null,"abstract":"<p><p>The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mapping problem, which leads to the failure of generating referential and meaningful questions from an image. ii) They fail to model complex implicit relations among the visual objects in an image and also overlook potential interactions between the side information and image. To address these limitations, we first propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference. Concretely, we aim to ask the right visual questions with Double Hints - textual answers and visual regions of interests, which could effectively mitigate the existing one-to-many mapping issue. Particularly, we develop a simple methodology to self-learn the visual hints without introducing any additional human annotations. Furthermore, to capture these sophisticated relationships, we propose a new double-hints guided Graph-to-Sequence learning framework, which first models them as a dynamic graph and learns the implicit topology end-to-end, and then utilizes a graph-to-sequence model to generate the questions with double hints. Experimental results demonstrate the priority of our proposed method.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Fidelity and Efficient Pluralistic Image Completion with Transformers. 利用变换器实现高保真、高效的多元图像补全。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-09 DOI: 10.1109/TPAMI.2024.3424835
Ziyu Wan, Jingbo Zhang, Dongdong Chen, Jing Liao
{"title":"High-Fidelity and Efficient Pluralistic Image Completion with Transformers.","authors":"Ziyu Wan, Jingbo Zhang, Dongdong Chen, Jing Liao","doi":"10.1109/TPAMI.2024.3424835","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3424835","url":null,"abstract":"<p><p>Image completion has made tremendous progress with convolutional neural networks (CNNs), because of their powerful texture modeling capacity. However, due to some inherent properties (e.g., local inductive prior, spatial-invariant kernels), CNNs do not perform well in understanding global structures or naturally support pluralistic completion. Recently, transformers demonstrate their power in modeling the long-term relationship and generating diverse results, but their computation complexity is quadratic to input length, thus hampering the application in processing high-resolution images. This paper brings the best of both worlds to pluralistic image completion: appearance prior reconstruction with transformer and texture replenishment with CNN. The former transformer recovers pluralistic coherent structures together with some coarse textures, while the latter CNN enhances the local texture details of coarse priors guided by the high-resolution masked images. To decode diversified outputs from transformers, auto-regressive sampling is the most common method, but with extremely low efficiency. We further overcome this issue by proposing a new decoding strategy, temperature annealing probabilistic sampling (TAPS), which firstly achieves more than 70× speedup of inference at most, meanwhile maintaining the high quality and diversity of the sampled global structures. Moreover, we find the full CNN architecture will lead to suboptimal solutions for guided upsampling. To render more realistic and coherent contents, we design a novel module, named texture-aware guided attention, to concurrently consider the procedures of texture copy and generation, meanwhile raising several important modifications to solve the boundary artifacts. Through dense experiments, we found the proposed method vastly outperforms state-of-the-art methods in terms of four aspects: 1) large performance boost on image fidelity even compared to deterministic completion methods; 2) better diversity and higher fidelity for pluralistic completion; 3) exceptional generalization ability on large masks and generic dataset, like ImageNet. 4) Much higher decoding efficiency over previous auto-regressive based methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-serial Quantization-aware Deep Optics for Snapshot Hyperspectral Imaging. 用于快照高光谱成像的非序列量化感知深度光学。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-09 DOI: 10.1109/TPAMI.2024.3425512
Lizhi Wang, Lingen Li, Weitao Song, Lei Zhang, Zhiwei Xiong, Hua Huang
{"title":"Non-serial Quantization-aware Deep Optics for Snapshot Hyperspectral Imaging.","authors":"Lizhi Wang, Lingen Li, Weitao Song, Lei Zhang, Zhiwei Xiong, Hua Huang","doi":"10.1109/TPAMI.2024.3425512","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3425512","url":null,"abstract":"<p><p>Deep optics has been endeavoring to capture hyperspectral images of dynamic scenes, where the optical encoder plays an essential role in deciding the imaging performance. Our key insight is that the optical encoder of a deep optics system is expected to keep fabrication-friendliness and decoder-friendliness, to be faithfully realized in the implementation phase and fully interacted with the decoder in the design phase, respectively. In this paper, we propose the non-serial quantization-aware deep optics (NSQDO), which consists of the fabrication-friendly quantization-aware model (QAM) and the decoder-friendly non-serial manner (NSM). The QAM integrates the quantization process into the optimization and adaptively adjusts the physical height of each quantization level, reducing the deviation of the physical encoder from the numerical simulation through the awareness of and adaptation to the quantization operation of the DOE physical structure. The NSM bridges the encoder and the decoder with full interaction through bidirectional hint connections and flexibilize the connections with a gating mechanism, boosting the power of joint optimization in deep optics. The proposed NSQDO improves the fabrication-friendliness and decoder-friendliness of the encoder and develops the deep optics framework to be more practical and powerful. Extensive synthetic simulation and real hardware experiments demonstrate the superior performance of the proposed method.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D Reconstruction from a Single Sketch via View-dependent Depth Sampling. 通过视图深度采样从单张草图进行三维重建
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-08 DOI: 10.1109/TPAMI.2024.3424404
Chenjian Gao, Xilin Wang, Qian Yu, Lu Sheng, Jing Zhang, Xiaoguang Han, Yi-Zhe Song, Dong Xu
{"title":"3D Reconstruction from a Single Sketch via View-dependent Depth Sampling.","authors":"Chenjian Gao, Xilin Wang, Qian Yu, Lu Sheng, Jing Zhang, Xiaoguang Han, Yi-Zhe Song, Dong Xu","doi":"10.1109/TPAMI.2024.3424404","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3424404","url":null,"abstract":"<p><p>Reconstructing a 3D shape based on a single sketch image is challenging due to the inherent sparsity and ambiguity present in sketches. Existing methods lose fine details when extracting features to predict 3D objects from sketches. Upon analyzing the 3D-to-2D projection process, we observe that the density map, characterizing the distribution of 2D point clouds, can serve as a proxy to facilitate the reconstruction process. In this work, we propose a novel sketch-based 3D reconstruction model named SketchSampler. It initiates the process by translating a sketch through an image translation network into a more informative 2D representation, which is then used to generate a density map. Subsequently, a two-stage probabilistic sampling process is employed to reconstruct a 3D point cloud: firstly, recovering the 2D points (i.e., the x and y coordinates) by sampling the density map; and secondly, predicting the depth (i.e., the z coordinate) by sampling the depth values along the ray determined by each 2D point. Additionally, we convert the reconstructed point cloud into a 3D mesh for wider applications. To reduce ambiguity, we incorporate hidden lines in sketches. Experimental results demonstrate that our proposed approach significantly outperforms other baseline methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141560602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation. 半监督三维实例分割的实例一致性正则化
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-05 DOI: 10.1109/TPAMI.2024.3424243
Yizheng Wu, Zhiyu Pan, Kewei Wang, Xingyi Li, Jiahao Cui, Liwen Xiao, Guosheng Lin, Zhiguo Cao
{"title":"Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation.","authors":"Yizheng Wu, Zhiyu Pan, Kewei Wang, Xingyi Li, Jiahao Cui, Liwen Xiao, Guosheng Lin, Zhiguo Cao","doi":"10.1109/TPAMI.2024.3424243","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3424243","url":null,"abstract":"<p><p>Large-scale datasets with point-wise semantic and instance labels are crucial to 3D instance segmentation but also expensive. To leverage unlabeled data, previous semi-supervised 3D instance segmentation approaches have explored self-training frameworks, which rely on high-quality pseudo labels for consistency regularization. They intuitively utilize both instance and semantic pseudo labels in a joint learning manner. However, semantic pseudo labels contain numerous noise derived from the imbalanced category distribution and natural confusion of similar but distinct categories, which leads to severe collapses in self-training. Motivated by the observation that 3D instances are non-overlapping and spatially separable, we ask whether we can solely rely on instance consistency regularization for improved semi-supervised segmentation. To this end, we propose a novel self-training network InsTeacher3D to explore and exploit pure instance knowledge from unlabeled data. We first build a parallel base 3D instance segmentation model DKNet, which distinguishes each instance from the others via discriminative instance kernels without reliance on semantic segmentation. Based on DKNet, we further design a novel instance consistency regularization framework to generate and leverage high-quality instance pseudo labels. Experimental results on multiple large-scale datasets show that the InsTeacher3D significantly outperforms prior state-of-the-art semi-supervised approaches.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141539128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models. Adan:更快优化深度模型的自适应内斯特罗夫动量算法
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-04 DOI: 10.1109/TPAMI.2024.3423382
Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan
{"title":"Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.","authors":"Xingyu Xie, Pan Zhou, Huan Li, Zhouchen Lin, Shuicheng Yan","doi":"10.1109/TPAMI.2024.3423382","DOIUrl":"10.1109/TPAMI.2024.3423382","url":null,"abstract":"<p><p>In deep learning, different kinds of deep networks typically need different optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve the model training speed across deep networks, we propose the ADAptive Nesterov momentum algorithm, Adan for short. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the gradient's first- and second-order moments in adaptive gradient algorithms for convergence acceleration. Besides, we prove that Adan finds an ϵ-approximate first-order stationary point within O(ϵ<sup>-3.5</sup>) stochastic gradient complexity on the non-convex stochastic problems (e.g.deep learning problems), matching the best-known lower bound. Extensive experimental results show that Adan consistently surpasses the corresponding SoTA optimizers on vision, language, and RL tasks and sets new SoTAs for many popular networks and frameworks, eg ResNet, ConvNext, ViT, Swin, MAE, DETR, GPT-2, Transformer-XL, and BERT. More surprisingly, Adan can use half of the training cost (epochs) of SoTA optimizers to achieve higher or comparable performance on ViT, GPT-2, MAE, etc, and also shows great tolerance to a large range of minibatch size, e.g.from 1k to 32k. Code is released at https://github.com/sail-sg/Adan, and has been used in multiple popular deep learning frameworks or projects.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141536217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D-PSSIM: Projective Structural Similarity for 3D Mesh Quality Assessment Robust to Topological Irregularities. 3D-PSSIM:针对拓扑不规则性的三维网格质量评估投影结构相似性。
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-03 DOI: 10.1109/TPAMI.2024.3422490
Seongmin Lee, Jiwoo Kang, Sanghoon Lee, Weisi Lin, Alan Conrad Bovik
{"title":"3D-PSSIM: Projective Structural Similarity for 3D Mesh Quality Assessment Robust to Topological Irregularities.","authors":"Seongmin Lee, Jiwoo Kang, Sanghoon Lee, Weisi Lin, Alan Conrad Bovik","doi":"10.1109/TPAMI.2024.3422490","DOIUrl":"https://doi.org/10.1109/TPAMI.2024.3422490","url":null,"abstract":"<p><p>Despite acceleration in the use of 3D meshes, it is difficult to find effective mesh quality assessment algorithms that can produce predictions highly correlated with human subjective opinions. Defining mesh quality features is challenging due to the irregular topology of meshes, which are defined on vertices and triangles. To address this, we propose a novel 3D projective structural similarity index ( 3D- PSSIM) for meshes that is robust to differences in mesh topology. We address topological differences between meshes by introducing multi-view and multi-layer projections that can densely represent the mesh textures and geometrical shapes irrespective of mesh topology. It also addresses occlusion problems that occur during projection. We propose visual sensitivity weights that capture the perceptual sensitivity to the degree of mesh surface curvature. 3D- PSSIM computes perceptual quality predictions by aggregating quality-aware features that are computed in multiple projective spaces onto the mesh domain, rather than on 2D spaces. This allows 3D- PSSIM to determine which parts of a mesh surface are distorted by geometric or color impairments. Experimental results show that 3D- PSSIM can predict mesh quality with high correlation against human subjective judgments, across the presence of noise, even when there are large topological differences, outperforming existing mesh quality assessment models.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Steerability of GANs via Self-Supervision from Discriminator. 通过鉴别器的自我监督实现 GAN 的空间可控性
IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-07-03 DOI: 10.1109/TPAMI.2024.3422820
Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou
{"title":"Spatial Steerability of GANs via Self-Supervision from Discriminator.","authors":"Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou","doi":"10.1109/TPAMI.2024.3422820","DOIUrl":"10.1109/TPAMI.2024.3422820","url":null,"abstract":"<p><p>Generative models make huge progress to the photorealistic image synthesis in recent years. To enable human to steer the image generation process and customize the output, many works explore the interpretable dimensions of the latent space in GANs. Existing methods edit the attributes of the output image such as orientation or color scheme by varying the latent code along certain directions. However, these methods usually require additional human annotations for each pretrained model, and they mostly focus on editing global attributes. In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. Along with training the GAN model from scratch, these heatmaps are being aligned with the emerging attention of the GAN's discriminator in a self-supervised learning manner. During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects. Moreover, we incorporate DragGAN into our framework, which facilitates fine-grained manipulation within a reasonable time and supports a coarse-to-fine editing process. Extensive experiments show that the proposed method not only enables spatial editing over human faces, animal faces, outdoor scenes, and complicated multi-object indoor scenes but also brings improvement in synthesis quality. Code, models, and demo video are available at https://genforce.github.io/SpatialGAN/.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信