IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Universal Fine-Grained Visual Categorization by Concept Guided Learning 概念引导学习的通用细粒度视觉分类
Qi Bi;Beichen Zhou;Wei Ji;Gui-Song Xia
{"title":"Universal Fine-Grained Visual Categorization by Concept Guided Learning","authors":"Qi Bi;Beichen Zhou;Wei Ji;Gui-Song Xia","doi":"10.1109/TIP.2024.3523802","DOIUrl":"10.1109/TIP.2024.3523802","url":null,"abstract":"Existing fine-grained visual categorization (FGVC) methods assume that the fine-grained semantics rest in the informative parts of an image. This assumption works well on favorable front-view object-centric images, but can face great challenges in many real-world scenarios, such as scene-centric images (e.g., street view) and adverse viewpoint (e.g., object re-identification, remote sensing). In such scenarios, the mis-/over- feature activation is likely to confuse the part selection and degrade the fine-grained representation. In this paper, we are motivated to design a universal FGVC framework for real-world scenarios. More precisely, we propose a concept guided learning (CGL), which models concepts of a certain fine-grained category as a combination of inherited concepts from its subordinate coarse-grained category and discriminative concepts from its own. The discriminative concepts is utilized to guide the fine-grained representation learning. Specifically, three key steps are designed, namely, concept mining, concept fusion, and concept constraint. On the other hand, to bridge the FGVC dataset gap under scene-centric and adverse viewpoint scenarios, a Fine-grained Land-cover Categorization Dataset (FGLCD) with 59,994 fine-grained samples is proposed. Extensive experiments show the proposed CGL: 1) has a competitive performance on conventional FGVC; 2) achieves state-of-the-art performance on fine-grained aerial scenes & scene-centric street scenes; 3) good generalization on object re-identification and fine-grained aerial object detection. The dataset and source code will be available at <uri>https://github.com/BiQiWHU/CGL</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"394-409"},"PeriodicalIF":0.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142934652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constrained Visual Representation Learning With Bisimulation Metrics for Safe Reinforcement Learning 基于双模拟度量的约束视觉表示学习用于安全强化学习
Rongrong Wang;Yuhu Cheng;Xuesong Wang
{"title":"Constrained Visual Representation Learning With Bisimulation Metrics for Safe Reinforcement Learning","authors":"Rongrong Wang;Yuhu Cheng;Xuesong Wang","doi":"10.1109/TIP.2024.3523798","DOIUrl":"10.1109/TIP.2024.3523798","url":null,"abstract":"Safe reinforcement learning aims to ensure the optimal performance while minimizing potential risks. In real-world applications, especially in scenarios that rely on visual inputs, a key challenge lies in the extraction of essential features for safe decision-making while maintaining the sample efficiency. To address this issue, we propose the constrained visual representation learning with bisimulation metrics for safe reinforcement learning (CVRL-BM). CVRL-BM constructs a sequential conditional variational inference model to compress high-dimensional visual observations into low-dimensional state representations. Additionally, safety bisimulation metrics are introduced to quantify the behavioral similarity between states, and our objective is to make the distance between any two latent state representations as close as possible to the safety bisimulation metric between their corresponding states. By integrating these two components, CVRL-BM is able to learn compact and information-rich visual state representations while satisfying predefined safety constraints. Experiments on Safety Gym show that CVRL-BM outperforms existing vision-based safe reinforcement learning methods in safety and efficacy. Particularly, CVRL-BM surpasses the state-of-the-art Safe SLAC method by achieving a 19.748% higher reward return, a 41.772% lower cost return, and a 5.027% decrease in cost regret. These results highlight the effectiveness of our proposed CVRL-BM.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"379-393"},"PeriodicalIF":0.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142934771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Latent Properties to Optimize Neural Codecs 利用潜在特性优化神经编解码器
Muhammet Balcilar;Bharath Bhushan Damodaran;Karam Naser;Franck Galpin;Pierre Hellier
{"title":"Exploiting Latent Properties to Optimize Neural Codecs","authors":"Muhammet Balcilar;Bharath Bhushan Damodaran;Karam Naser;Franck Galpin;Pierre Hellier","doi":"10.1109/TIP.2024.3522813","DOIUrl":"10.1109/TIP.2024.3522813","url":null,"abstract":"End-to-end image and video codecs are becoming increasingly competitive, compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques, such as their straightforward adaptation to perceptual distortion metrics and high performance in specific fields thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the entropy gradient in decoding devices. In this paper, we propose to leverage these two properties (vector quantization and entropy gradient) to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. We thus suggest using predefined optimal uniform vector quantization to improve performance. Secondly, we show that the entropy gradient, available at the decoder, is correlated with the reconstruction error gradient, which is not available at the decoder. We therefore use the former as a proxy to enhance compression performance. Our experimental results show that these approaches save between 1 to 3% of the rate for the same quality across various pre-trained methods. In addition, the entropy gradient based solution improves traditional codec performance significantly as well.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"306-319"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Refining Pseudo Labeling via Multi-Granularity Confidence Alignment for Unsupervised Cross Domain Object Detection 基于多粒度置信度对齐的无监督跨域目标检测伪标记改进
Jiangming Chen;Li Liu;Wanxia Deng;Zhen Liu;Yu Liu;Yingmei Wei;Yongxiang Liu
{"title":"Refining Pseudo Labeling via Multi-Granularity Confidence Alignment for Unsupervised Cross Domain Object Detection","authors":"Jiangming Chen;Li Liu;Wanxia Deng;Zhen Liu;Yu Liu;Yingmei Wei;Yongxiang Liu","doi":"10.1109/TIP.2024.3522807","DOIUrl":"10.1109/TIP.2024.3522807","url":null,"abstract":"Most state-of-the-art object detection methods suffer from poor generalization due to the domain shift between the training and testing datasets. To resolve this challenge, unsupervised cross domain object detection is proposed to learn an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, leading to the injection of noisy pseudo labels in the training process, will bring suboptimal performance. Considering the above issue, we present a novel general framework termed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT) for unsupervised cross domain object detection, which alleviates confidence misalignment across category-, instance-, and image-levels simultaneously to refine pseudo labeling for better teacher-student learning. Specifically, to align confidence with accuracy at category level, we propose Classification Confidence Alignment (CCA) to model category uncertainty based on Evidential Deep Learning (EDL) and filter out the category incorrect labels via an uncertainty-aware selection strategy. Furthermore, we design Task Confidence Alignment (TCA) to mitigate the instance-level misalignment between classification and localization by enabling each classification feature to adaptively identify the optimal feature for regression. Finally, we develop imagery Focusing Confidence Alignment (FCA) adopting another way of pseudo label learning, i.e., we use the original outputs from the Mean Teacher network for supervised learning without label assignment to achieve a balanced perception of the image’s spatial layout. When these three procedures are integrated into a single framework, they mutually benefit to improve the final performance from a cooperative learning perspective. Extensive experiments across multiple scenarios demonstrate that our method outperforms large foundational models, and surpasses other state-of-the-art approaches by a large margin.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"279-294"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reviewer Summary for Transactions on Image Processing 《图像处理汇刊》审稿人总结
{"title":"Reviewer Summary for Transactions on Image Processing","authors":"","doi":"10.1109/TIP.2024.3513592","DOIUrl":"10.1109/TIP.2024.3513592","url":null,"abstract":"","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6905-6925"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819972","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition 半监督骨骼动作识别的动量对比教师
Mingqi Lu;Xiaobo Lu;Jun Liu
{"title":"Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition","authors":"Mingqi Lu;Xiaobo Lu;Jun Liu","doi":"10.1109/TIP.2024.3522818","DOIUrl":"10.1109/TIP.2024.3522818","url":null,"abstract":"In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"295-305"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deformable Convolution-Enhanced Hierarchical Transformer With Spectral-Spatial Cluster Attention for Hyperspectral Image Classification 基于光谱-空间聚类关注的可变形卷积增强层次变换高光谱图像分类
Yu Fang;Le Sun;Yuhui Zheng;Zebin Wu
{"title":"Deformable Convolution-Enhanced Hierarchical Transformer With Spectral-Spatial Cluster Attention for Hyperspectral Image Classification","authors":"Yu Fang;Le Sun;Yuhui Zheng;Zebin Wu","doi":"10.1109/TIP.2024.3522809","DOIUrl":"10.1109/TIP.2024.3522809","url":null,"abstract":"Vision Transformer (ViT), known for capturing non-local features, is an effective tool for hyperspectral image classification (HSIC). However, ViT’s multi-head self-attention (MHSA) mechanism often struggles to balance local details and long-range relationships for complex high-dimensional data, leading to a loss in spectral-spatial information representation. To address this issue, we propose a deformable convolution-enhanced hierarchical Transformer with spectral-spatial cluster attention (SClusterFormer) for HSIC. The model incorporates a unique cluster attention mechanism that utilizes spectral angle similarity and Euclidean distance metrics to enhance the representation of fine-grained homogenous local details and improve discrimination of non-local structures in 3D HSI and 2D morphological data, respectively. Additionally, a dual-branch multiscale deformable convolution framework augmented with frequency-based spectral attention is designed to capture both the discrepancy patterns in high-frequency and overall trend of the spectral profile in low-frequency. Finally, we utilize a cross-feature pixel-level fusion module for collaborative cross-learning and fusion of the results from the dual-branch framework. Comprehensive experiments conducted on multiple HSIC datasets validate the superiority of our proposed SClusterFormer model, which outperforms existing methods. The source code of SClusterFormer is available at <uri>https://github.com/Fang666666/HSIC_SClusterFormer</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"701-716"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142911978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linearly Transformed Color Guide for Low-Bitrate Diffusion-Based Image Compression 基于低比特率扩散的图像压缩的线性变换颜色指南
Tom Bordin;Thomas Maugey
{"title":"Linearly Transformed Color Guide for Low-Bitrate Diffusion-Based Image Compression","authors":"Tom Bordin;Thomas Maugey","doi":"10.1109/TIP.2024.3521301","DOIUrl":"10.1109/TIP.2024.3521301","url":null,"abstract":"This study addresses the challenge of controlling the global color aspect of images generated by a diffusion model without training or fine-tuning. We rewrite the guidance equations to ensure that the outputs are closer to a known color map, without compromising the quality of the generation. Our method results in new guidance equations. In the context of color guidance, we show that the scaling of the guidance should not decrease but rather increase throughout the diffusion process. In a second contribution, our guidance is applied in a compression framework, where we combine both semantic and general color information of the image to decode at very low cost. We show that our method is effective in improving the fidelity and realism of compressed images at extremely low bit rates (<inline-formula> <tex-math>$10^{-2}$ </tex-math></inline-formula>bpp), performing better on these criteria when compared to other classical or more semantically oriented approaches. The implementation of our method is available on gitlab at <uri>https://gitlab.inria.fr/tbordin/color-guidance</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"468-482"},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Normalizing Batch Normalization for Long-Tailed Recognition 长尾识别的批归一化
Yuxiang Bao;Guoliang Kang;Linlin Yang;Xiaoyue Duan;Bo Zhao;Baochang Zhang
{"title":"Normalizing Batch Normalization for Long-Tailed Recognition","authors":"Yuxiang Bao;Guoliang Kang;Linlin Yang;Xiaoyue Duan;Bo Zhao;Baochang Zhang","doi":"10.1109/TIP.2024.3518099","DOIUrl":"10.1109/TIP.2024.3518099","url":null,"abstract":"In real-world scenarios, the number of training samples across classes usually subjects to a long-tailed distribution. The conventionally trained network may achieve unexpected inferior performance on the rare class compared to the frequent class. Most previous works attempt to rectify the network bias from the data-level or from the classifier-level. Differently, in this paper, we identify that the bias towards the frequent class may be encoded into features, i.e., the rare-specific features which play a key role in discriminating the rare class are much weaker than the frequent-specific features. Based on such an observation, we introduce a simple yet effective approach, normalizing the parameters of Batch Normalization (BN) layer to explicitly rectify the feature bias. To achieve this end, we represent the Weight/Bias parameters of a BN layer as a vector, normalize it into a unit one and multiply the unit vector by a scalar learnable parameter. Through decoupling the direction and magnitude of parameters in BN layer to learn, the Weight/Bias exhibits a more balanced distribution and thus the strength of features becomes more even. Extensive experiments on various long-tailed recognition benchmarks (i.e., CIFAR-10/100-LT, ImageNet-LT and iNaturalist 2018) show that our method outperforms previous state-of-the-arts remarkably.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"209-220"},"PeriodicalIF":0.0,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regularization by Denoising: Bayesian Model and Langevin-Within-Split Gibbs Sampling 去噪正则化:贝叶斯模型和朗格万-分裂内吉布斯抽样
Elhadji C. Faye;Mame Diarra Fall;Nicolas Dobigeon
{"title":"Regularization by Denoising: Bayesian Model and Langevin-Within-Split Gibbs Sampling","authors":"Elhadji C. Faye;Mame Diarra Fall;Nicolas Dobigeon","doi":"10.1109/TIP.2024.3520012","DOIUrl":"10.1109/TIP.2024.3520012","url":null,"abstract":"This paper introduces a Bayesian framework for image inversion by deriving a probabilistic counterpart to the regularization-by-denoising (RED) paradigm. It additionally implements a Monte Carlo algorithm specifically tailored for sampling from the resulting posterior distribution, based on an asymptotically exact data augmentation (AXDA). The proposed algorithm is an approximate instance of split Gibbs sampling (SGS) which embeds one Langevin Monte Carlo step. The proposed method is applied to common imaging tasks such as deblurring, inpainting and super-resolution, demonstrating its efficacy through extensive numerical experiments. These contributions advance Bayesian inference in imaging by leveraging data-driven regularization strategies within a probabilistic framework.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"221-234"},"PeriodicalIF":0.0,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信