IEEE Transactions on Circuits and Systems for Video Technology最新文献_第7页

Efficient Non-Blind Image Deblurring With Discriminative Shrinkage Deep Networks 基于判别收缩深度网络的高效非盲图像去模糊

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-24 DOI: 10.1109/TCSVT.2025.3553846

Pin-Hung Kuo;Jinshan Pan;Shao-Yi Chien;Ming-Hsuan Yang

{"title":"Efficient Non-Blind Image Deblurring With Discriminative Shrinkage Deep Networks","authors":"Pin-Hung Kuo;Jinshan Pan;Shao-Yi Chien;Ming-Hsuan Yang","doi":"10.1109/TCSVT.2025.3553846","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553846","url":null,"abstract":"Most existing non-blind deblurring methods formulate the problem into a maximum-a-posteriori framework and address it by manually designing a variety of regularization terms and data terms of the latent clear images. However, explicitly designing these two terms is quite challenging, which usually leads to complex optimization problems. In this paper, we propose a Discriminative Shrinkage Deep Network for fast and accurate deblurring. Most existing methods use deep convolutional neural networks (CNNs), or radial basis functions only to learn the regularization term. In contrast, we formulate both the data and regularization terms while splitting the deconvolution model into data-related and regularization-related sub-problems. We explore the properties of the Maxout function and develop a deep CNN model with Maxout layers to learn discriminative shrinkage functions, which directly approximate the solutions of these two sub-problems. Moreover, we develop a U-Net according to Krylov subspace method to restore the latent clear images effectively and efficiently, which plays a role but is better than the conventional fast-Fourier-transform-based or conjugate gradient method. Experimental results show that the proposed method performs favorably against the state-of-the-art methods regarding efficiency and accuracy.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8545-8558"},"PeriodicalIF":11.1,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation 基于概念级语义传递和上下文级分布的少镜头分割建模

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-24 DOI: 10.1109/TCSVT.2025.3554013

Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong

{"title":"Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation","authors":"Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong","doi":"10.1109/TCSVT.2025.3554013","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3554013","url":null,"abstract":"Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: 1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; 2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and 3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-<inline-formula> <tex-math>$5^{i}$ </tex-math></inline-formula> and COCO-<inline-formula> <tex-math>$20^{i}$ </tex-math></inline-formula> datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9190-9204"},"PeriodicalIF":11.1,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TPE for JPEG Images With Dynamic M-Ary Decomposition and Adaptive Threshold Constraints 基于动态M-Ary分解和自适应阈值约束的JPEG图像TPE

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-24 DOI: 10.1109/TCSVT.2025.3553962

Yakun Ma;Xiuli Chai;Guoqiang Long;Zhihua Gan;Yushu Zhang

{"title":"TPE for JPEG Images With Dynamic M-Ary Decomposition and Adaptive Threshold Constraints","authors":"Yakun Ma;Xiuli Chai;Guoqiang Long;Zhihua Gan;Yushu Zhang","doi":"10.1109/TCSVT.2025.3553962","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553962","url":null,"abstract":"Traditional JPEG image encryption that prioritizes solely confidentiality fails to account for the pressing usability requirements of cloud-based environments, thus boosting the boom in thumbnail-preserving encryption (TPE) to balance image privacy and usability. However, existing TPE schemes for JPEG images face numerous challenges, such as insufficient security, inability to achieve lossless decryption, and high file extension. To address these challenges, we propose a TPE scheme based on dynamic M-ary decomposition and adaptive threshold constraints (TPE-MDTC). First, the valid ranges of quantized DC coefficients for JPEG images are determined. Then, a sum-preserving encryption method for quantized DC coefficients with compliance threshold constraints is designed using the bit-plane permutation to preserve thumbnails with high accuracy. Next, the introduction of dynamic M-ary decomposition effectively changes bit statistical characteristics preserved by bit-plane permutation, enhancing the ciphertext security. Finally, a quantized AC encryption method with RV (Run/Value) pair global permutation is proposed, effectively modifying the unit block features, thereby significantly improving the security and attack resistance of encrypted images. Experimental results show that the proposed TPE-MDTC scheme can reconstruct the original JPEG images without loss, and the generated ciphertext images exhibit significant advantages over previous schemes regarding file extension and security.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8864-8879"},"PeriodicalIF":11.1,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

M3CS: Multi-Target Masked Point Modeling With Learnable Codebook and Siamese Decoders M3CS：多目标掩蔽点建模与可学习的代码本和暹罗解码器

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI: 10.1109/TCSVT.2025.3553525

Qibo Qiu;Honghui Yang;Jian Jiang;Shun Zhang;Haochao Ying;Haiming Gao;Wenxiao Wang;Xiaofei He

{"title":"M3CS: Multi-Target Masked Point Modeling With Learnable Codebook and Siamese Decoders","authors":"Qibo Qiu;Honghui Yang;Jian Jiang;Shun Zhang;Haochao Ying;Haiming Gao;Wenxiao Wang;Xiaofei He","doi":"10.1109/TCSVT.2025.3553525","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553525","url":null,"abstract":"Masked point modeling has become a promising scheme of self-supervised pre-training for point clouds. Existing methods reconstruct either the masked points or related features as the objective of pre-training. However, considering the diversity of downstream tasks, it is necessary for the model to have both low- and high-level representation modeling capabilities during pre-training. It enables the capture of both geometric details and semantic contexts. To this end, M3CS is proposed to endow the model with the above abilities. Specifically, with the masked point cloud as input, M3CS introduces two decoders to reconstruct masked representations and the masked points simultaneously. While an extra decoder doubles parameters for the decoding process and may lead to overfitting, we propose siamese decoders to keep the number of learnable parameters unchanged. Further, we propose an online codebook projecting continuous tokens into discrete ones before reconstructing masked points. In such a way, we can compel the decoder to take effect through the combinations of tokens rather than remembering each token. Comprehensive experiments show that M3CS achieves superior performance across both classification and segmentation tasks, outperforming existing methods that are also single-modality and single-scale.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8807-8818"},"PeriodicalIF":11.1,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution MAT：用于高效图像超分辨率的多范围注意力转换器

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI: 10.1109/TCSVT.2025.3553135

Chengxing Xie;Xiaoming Zhang;Linze Li;Yuqian Fu;Biao Gong;Tianrui Li;Kai Zhang

{"title":"MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution","authors":"Chengxing Xie;Xiaoming Zhang;Linze Li;Yuqian Fu;Biao Gong;Tianrui Li;Kai Zhang","doi":"10.1109/TCSVT.2025.3553135","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553135","url":null,"abstract":"Image super-resolution (SR) has significantly advanced through the adoption of Transformer architectures. However, conventional techniques aimed at enlarging the self-attention window to capture broader contexts come with inherent drawbacks, especially the significantly increased computational demands. Moreover, the feature perception within a fixed-size window of existing models restricts the effective receptive field (ERF) and the intermediate feature diversity. We demonstrate that a flexible integration of attention across diverse spatial extents can yield significant performance enhancements. In line with this insight, we introduce Multi-Range Attention Transformer (MAT) for SR tasks. MAT leverages the computational advantages inherent in dilation operation, in conjunction with self-attention mechanism, to facilitate both multi-range attention (MA) and sparse multi-range attention (SMA), enabling efficient capture of both regional and sparse global features. Combined with local feature extraction, MAT adeptly capture dependencies across various spatial ranges, improving the diversity and efficacy of its feature representations. We also introduce the MSConvStar module, which augments the model’s ability for multi-range representation learning. Comprehensive experiments show that our MAT exhibits superior performance to existing state-of-the-art SR models with remarkable efficiency (<inline-formula> <tex-math>$sim 3.3times $ </tex-math></inline-formula> faster than SRFormer-light). The codes are available at <uri>https://github.com/stella-von/MAT</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8945-8957"},"PeriodicalIF":11.1,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PPIDM: Privacy-Preserving Inference for Diffusion Model in the Cloud PPIDM：云中扩散模型的隐私保护推理

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI: 10.1109/TCSVT.2025.3553514

Zhangdong Wang;Zhihuang Liu;Yuanjing Luo;Tongqing Zhou;Jiaohua Qin;Zhiping Cai

{"title":"PPIDM: Privacy-Preserving Inference for Diffusion Model in the Cloud","authors":"Zhangdong Wang;Zhihuang Liu;Yuanjing Luo;Tongqing Zhou;Jiaohua Qin;Zhiping Cai","doi":"10.1109/TCSVT.2025.3553514","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553514","url":null,"abstract":"Cloud environments enhance diffusion model efficiency but introduce privacy risks, including intellectual property theft and data breaches. As AI-generated images gain recognition as copyright-protected works, ensuring their security and intellectual property protection in cloud environments has become a pressing challenge. This paper addresses privacy protection in diffusion model inference under cloud environments, identifying two key characteristics—denoising-encryption antagonism and stepwise generative nature—that create challenges such as incompatibility with traditional encryption, incomplete input parameter representation, and inseparability of the generative process. We propose PPIDM (<bold>Privacy-<bold>Preserving <bold>Inference for <bold>Diffusion <bold>Models), a framework that balances efficiency and privacy by retaining lightweight text encoding and image decoding on the client while offloading computationally intensive U-Net layers to multiple non-colluding cloud servers. Client-side aggregation reduces computational overhead and enhances security. Experiments show PPIDM offloads 67% of Stable Diffusion computations to the cloud, reduces image leakage by 75%, and maintains high output quality (PSNR = 36.9, FID = 4.56), comparable to standard outputs. PPIDM offers a secure and efficient solution for cloud-based diffusion model inference.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8849-8863"},"PeriodicalIF":11.1,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual Geometry Learning and Adaptive Sparse Attention for Point Cloud Analysis 点云分析的对偶几何学习和自适应稀疏关注

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI: 10.1109/TCSVT.2025.3553537

Ce Zhou;Qiang Ling

{"title":"Dual Geometry Learning and Adaptive Sparse Attention for Point Cloud Analysis","authors":"Ce Zhou;Qiang Ling","doi":"10.1109/TCSVT.2025.3553537","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553537","url":null,"abstract":"Point cloud analysis is essential in accurately perceiving and analyzing real-world scenarios. Recently, transformer-based models have demonstrated great performance superiority in diverse domains. Nonetheless, directly applying transformers to point clouds is still challenging, primarily due to the computational intensity of transformers, which may significantly compromise their efficacy. Moreover, most methods typically rely on the relative 3D coordinates of point pairs to generate geometric information without fully exploiting the inherent local geometric properties. To tackle these challenges, we propose DGAS-Net, a novel architecture to enhance point cloud analysis. Specifically, we propose a Dual Geometry Learning (DGL) module to generate explicit geometric descriptors from triangular representations. These descriptors capture the local shape and geometric details of each point, serving as the foundation for deriving informative geometric features. Subsequently, we introduce a Dual Geometry Context Aggregation (DGCA) module to efficiently merge local geometric and semantic information. Furthermore, we design an Adaptive Sparse Attention (ASA) module to capture long-range information and expand the effective receptive field. ASA adaptively selects globally representative points and employs a novel vector attention mechanism for efficient global information fusion, thereby significantly reducing the computational complexity. Extensive experiments on four datasets demonstrate the superiority of DGAS-Net for various point cloud analysis tasks. The codes of DGAS-Net are available at <uri>https://github.com/zcustc-10/DGAS-Net</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9075-9089"},"PeriodicalIF":11.1,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GADFNet: Geometric Priors Assisted Dual-Projection Fusion Network for Monocular Panoramic Depth Estimation 基于几何先验的双投影融合网络的单目全景深度估计

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI: 10.1109/TCSVT.2025.3553472

Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu

{"title":"GADFNet: Geometric Priors Assisted Dual-Projection Fusion Network for Monocular Panoramic Depth Estimation","authors":"Chengchao Huang;Feng Shao;Hangwei Chen;Baoyang Mu;Long Xu","doi":"10.1109/TCSVT.2025.3553472","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553472","url":null,"abstract":"Panoramic depth estimation is crucial for acquiring comprehensive 3D environmental perception information, serving as a foundational basis for numerous panoramic vision tasks. The key challenge in panoramic depth estimation is how to address various distortions in 360° omnidirectional images. Most panoramic images are displayed as 2D equirectangular projections, which exhibit significant distortion, particularly with the severe fisheye effect near the equatorial regions. Traditional depth estimation methods for perspective images are unsuitable for such projections. On the other hand, cubemap projection consists of six distortion-free perspective images, allowing the use of existing depth estimation methods. However, the boundaries between faces of a cubemap projection introduce discontinuities, causing a loss of global information when using cube maps alone. In this work, we propose an innovative geometric priors assisted dual-projection fusion network (GADFNet) that leverages geometric priors of panoramic images and the strengths of both projection types to enhance the accuracy of panoramic depth estimation. Specifically, to better focus the network on key areas, we introduce a distortion perception module (DPM) and incorporate geometric information into the loss function. To more effectively extract global information from the equirectangular projection branch, we propose a scene understanding module (SUM), which captures features from different dimensions. Additionally, to achieve effective fusion of the two projections, we design a dual projection adaptive fusion module (DPAFM) to dynamically adjust the weights of the two branches during fusion. Extensive experiments conducted on four public datasets (including both virtual and real-world scenarios) demonstrate that our proposed GADFNet outperforms existing methods, achieving superior performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9060-9074"},"PeriodicalIF":11.1,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RViDeformer: Efficient Raw Video Denoising Transformer With a Larger Benchmark Dataset RViDeformer：高效的原始视频去噪变压器与更大的基准数据集

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-20 DOI: 10.1109/TCSVT.2025.3553160

Huanjing Yue;Cong Cao;Lei Liao;Jingyu Yang

{"title":"RViDeformer: Efficient Raw Video Denoising Transformer With a Larger Benchmark Dataset","authors":"Huanjing Yue;Cong Cao;Lei Liao;Jingyu Yang","doi":"10.1109/TCSVT.2025.3553160","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553160","url":null,"abstract":"In recent years, raw video denoising has garnered increased attention due to the consistency with the imaging process and well-studied noise modeling in the raw domain. However, two problems still hinder the denoising performance. Firstly, there is no large dataset with realistic motions for supervised raw video denoising, as capturing noisy and clean frames for real dynamic scenes is difficult. To address this, we propose recapturing existing high-resolution videos displayed on a 4K screen with high-low ISO settings to construct noisy-clean paired frames. In this way, we construct a video denoising dataset (named as ReCRVD) with 120 groups of noisy-clean videos, whose ISO values ranging from 1600 to 25600. Secondly, while non-local temporal-spatial attention is beneficial for denoising, it often leads to heavy computation costs. We propose an efficient raw video denoising transformer network (RViDeformer) that explores both short and long-distance correlations. Specifically, we propose multi-branch spatial and temporal attention modules, which explore the patch correlations from local window, local low-resolution window, global downsampled window, and neighbor-involved window, and then they are fused together. We employ reparameterization to reduce computation costs. Our network is trained in both supervised and unsupervised manners, achieving the best performance compared with state-of-the-art methods. Additionally, the model trained with our proposed dataset (ReCRVD) outperforms the model trained with previous benchmark dataset (CRVD) when evaluated on the real-world outdoor noisy videos. <italic>Our code and dataset will be released.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8929-8944"},"PeriodicalIF":11.1,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Pseudo-Label Purification and Debiasing for Unsupervised Visible-Infrared Person Re-Identification 基于自适应伪标签纯化和去偏的无监督可见红外人再识别

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-20 DOI: 10.1109/TCSVT.2025.3571976

Xiangbo Yin;Jiangming Shi;Zhizhong Zhang;Yuan Xie;Yanyun Qu

{"title":"Adaptive Pseudo-Label Purification and Debiasing for Unsupervised Visible-Infrared Person Re-Identification","authors":"Xiangbo Yin;Jiangming Shi;Zhizhong Zhang;Yuan Xie;Yanyun Qu","doi":"10.1109/TCSVT.2025.3571976","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3571976","url":null,"abstract":"Unsupervised Visible-Infrared Person Re-Identification (USVI-ReID) aims to match visible and infrared person images without relying on prior annotations. Recently, unsupervised contrastive learning methods have become the mainstream approach for USVI-ReID, leveraging clustering algorithms to generate pseudo-labels. However, these methods often suffer from inherent noisy pseudo-labels, which significantly hinders their performance. To address this challenge, we propose a Adaptive Pseudo-label Purification and Debiasing (APPD) framework for USVI-ReID, which is designed to calibrate noisy pseudo-labels and dynamically detects clean pseudo-labels, thereby enhancing the model’s performance and reliability. Specifically, we propose an Adaptive Pseudo-label Calibration and Division (APCD) module, which calibrates noisy pseudo-labels by assessing their reliability and divides pseudo-labels into clean and noisy subsets, ensuring a more focused and accurate learning process. Based on the calibrated pseudo-labels, we develop an Optimal Transport Prototype Matching (OTPM) module to establish robust cross-modality correspondences. For clean pseudo-labels, we propose a Debiased Memory Hybrid Learning (DMHL) module, which jointly captures modality-specific and modality-invariant information while addressing sampling bias to enhance feature representation. To effectively utilize noisy pseudo-labels, we introduce a Neighbor Relation Learning (NRL) module that mitigates intra-class variations by exploring neighbor relationships in the feature space. Comprehensive experiments conducted on two widely recognized USVI-ReID benchmarks demonstrate that APPD achieves state-of-the-art performance, significantly outperforming existing methods. The source code will be made available at <uri>https://github.com/XiangboYin/RPNR</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10571-10585"},"PeriodicalIF":11.1,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0