IEEE Transactions on Circuits and Systems for Video Technology最新文献

TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation TinySplat：生成紧凑3D场景表示的前馈方法

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2026-02-16 DOI: 10.1109/TCSVT.2026.3664794

Zetian Song;Jiaye Fu;Jiaqi Zhang;Xiaohan Lu;Chuanmin Jia;Siwei Ma;Wen Gao

{"title":"TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation","authors":"Zetian Song;Jiaye Fu;Jiaqi Zhang;Xiaohan Lu;Chuanmin Jia;Siwei Ma;Wen Gao","doi":"10.1109/TCSVT.2026.3664794","DOIUrl":"https://doi.org/10.1109/TCSVT.2026.3664794","url":null,"abstract":"The recent development of feedforward 3D Gaussian Splatting (3DGS) presents a new paradigm to reconstruct 3D scenes. Using neural networks trained on large-scale multi-view datasets, it can directly infer 3DGS representations from sparse input views. Although the feedforward approach achieves high reconstruction speed, it still suffers from the substantial storage cost of 3D Gaussians. Existing 3DGS compression methods relying on scene-wise optimization are not applicable due to architectural incompatibilities. To overcome this limitation, we propose TinySplat, a complete feedforward approach for generating compact 3D scene representations. Built upon standard feedforward 3DGS methods, TinySplat integrates a training-free compression framework that systematically eliminates key sources of redundancy. Specifically, we introduce View-Projection Transformation (VPT) to reduce geometric redundancy by projecting geometric parameters into a more compact space. We further present Visibility-Aware Basis Reduction (VABR), which mitigates perceptual redundancy by aligning feature energy along dominant viewing directions via basis transformation. Lastly, spatial redundancy is addressed through an off-the-shelf video codec. Comprehensive experimental results on multiple benchmark datasets demonstrate that TinySplat achieves over <inline-formula> <tex-math>$100times $ </tex-math></inline-formula> compression for 3D Gaussian data generated by feedforward methods. Compared to the state-of-the-art compression approach, we achieve comparable quality with only 6% of the storage size. Meanwhile, our compression framework requires only 25% of the encoding time and 1% of the decoding time.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5567-5580"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learned Point Cloud Attribute Compression With Cross-Scale Point Transformer and Geometry-Aware Context Prediction Entropy Model 基于跨尺度点转换器和几何感知上下文预测熵模型的学习点云属性压缩

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-10-23 DOI: 10.1109/TCSVT.2025.3624688

Xiao Huo;Wei Zhang;Fuzheng Yang

{"title":"Learned Point Cloud Attribute Compression With Cross-Scale Point Transformer and Geometry-Aware Context Prediction Entropy Model","authors":"Xiao Huo;Wei Zhang;Fuzheng Yang","doi":"10.1109/TCSVT.2025.3624688","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3624688","url":null,"abstract":"Point clouds are a fundamental format for immersive experiences, posing significant challenges for storage and transmission. Unlike 2D image compression, 3D point clouds are sparse and irregular, complicating their attribute compression. While sparse convolution-based methods have made significant success on point cloud attribute compression by leveraging the sparsity of point clouds, they are constrained by a limited receptive field and insufficient adaptability to diverse inputs. To overcome these limitations, this paper proposes a novel point transformer-based architecture to exploit correlations and aggregate features across multiple scales (CSFormer). It retains the advantage of sparse convolution operating on occupied voxels and leverages varied sparsity distributions and the geometry distortions inherent in consecutive scales to construct attention maps, effectively extending the receptive field and adapting to different inputs. We further introduced GCPEM, a Geometry-aware Context Prediction-based Entropy Model that reduces bitrates by jointly utilizing the spatial and channel dependencies. Unlike previous methods that capture only one type of the information, GCPEM organizes latent features into groups interlaced across both space and channel dimensions and employs a context-prediction mechanism guided by known geometry for efficient coding. Experimental results show that the proposed method outperforms the state-of-the-art learning-based method and MPEG standard G-PCC codec over 7% and 28% in BD-BR (Y-PSNR), respectively. It has a time complexity comparable to the state-of-the-art learning-based method and the G-PCC. The source code and trained models will be released at <uri>https://github.com/X-H-offical/CST-PCAC-plus.git</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5538-5552"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Expressive Human Volumetric Video Generation With Rich Text 富有表现力的人类体积视频生成与丰富的文本

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-11-04 DOI: 10.1109/TCSVT.2025.3628996

Yi Yang;Guanghui Yue;Wei Zhou;Xudong Mao;Ruomei Wang;Baoquan Zhao

{"title":"Expressive Human Volumetric Video Generation With Rich Text","authors":"Yi Yang;Guanghui Yue;Wei Zhou;Xudong Mao;Ruomei Wang;Baoquan Zhao","doi":"10.1109/TCSVT.2025.3628996","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3628996","url":null,"abstract":"Plain text has become the dominant interactive interface for text-driven human volumetric video generation. However, its limited customization options hinder users from expressing motion effects with accuracy. For example, plain text struggles to specify continuous variables such as motion amplitude, speed, and joint trajectories with precision, and it fails to convey stylized motion characteristics. Additionally, crafting detailed textual prompts for complex motion sequences is cumbersome, while excessively long prompts strain text encoders. To address these limitations, we propose a rich text-based framework that supports font styles, sizes, and trajectory sketching. By extracting motion-related attributes from rich text, our method enables fine-grained control over motion styles, precise speed regulation, and accurate joint trajectory manipulation. These capabilities are realized through gradient-guided noise editing and ControlNet-based motion optimization, which operate within the latent motion diffusion process. Specifically, we design a unified gradient-guided adaptation mechanism to ensure that the generated motion video adheres strictly to the specified constraints. Furthermore, we introduce realism-oriented optimization for stylistic and joint-level control, refining motion synthesis at a granular level to produce smoother, more natural movements. We present multiple comparative evaluations showcasing volumetric video generation from both rich text and plain text. Through quantitative analysis, we demonstrate that our method surpasses strong plain-text baselines, producing expressive, customizable human volumetric motion videos.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5424-5436"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Confidence-Aware Prototypes for Weakly-Supervised Video Anomaly Detection 学习用于弱监督视频异常检测的自信感知原型

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-11-04 DOI: 10.1109/TCSVT.2025.3628630

Zhao Xie;Jinkang Luo;Kewei Wu;Zhehan Kan;Dan Guo

{"title":"Learning Confidence-Aware Prototypes for Weakly-Supervised Video Anomaly Detection","authors":"Zhao Xie;Jinkang Luo;Kewei Wu;Zhehan Kan;Dan Guo","doi":"10.1109/TCSVT.2025.3628630","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3628630","url":null,"abstract":"Weakly supervised video anomaly detection aims to identify abnormal snippets in untrimmed videos. Existing methods learn prototypes to describe global representation of snippet distributions. But, in weakly-labeled videos, the normal snippets in abnormal video may take high-uncertainty labels for distribution modeling. Without confidence-aware modeling, abnormal/normal prototype distributions may overlap with each other, leading to inaccurate predictions. In this work, we propose the Unified Confident Prototype (UCP) model, which contains a feature extractor, a confidence-aware prototype learner, and a local-global prototype unifier. The prototype learning is designed to ensure proper separability, stability, and representation. <italic>First</i>, after learning the weight of each snippet’s loss, snippets with high-uncertainty labels may take small weights. These snippets tend to lie in the overlap between abnormal/normal distributions, hindering their separation. We design uncertainty-aware sampling, which removes high-uncertainty snippets in the small-weight snippets to ensure separable prototype learning. <italic>Second</i>, snippets with high-uncertainty labels tend to be far from the prototype center, thus falling in the low-confidence region. These snippets may enlarge the distribution’s variation, resulting in unstable prototype learning. We design confidence-aware sampling, which removes low-confidence snippets to ensure stable prototype learning. <italic>Third</i>, after assigning pseudo labels to prototypes, we measure the prototype representation with the distribution’s purity. We design prototype distribution purification, which penalizes normal snippets in the abnormal-majority distribution with purity loss to ensure representative prototype learning. <italic>Fourth</i>, beyond prototype learning, prototypes can be enhanced by local/global temporal semantics. We further introduce the local-global prototype unifier to learn the relations across local-global durations, thereby enhancing the semantics for anomaly detection. For weakly-supervised anomaly detection, experiments demonstrate that our method achieves state-of-the-art performance on the UCF-Crime, ShanghaiTech, and XD-Violence datasets. Moreover, to further verify the generality of our method, we further conduct experiments on THUMOS’14 for weakly-supervised temporal action localization.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5714-5728"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CNVC: A Compact Neural Video Codec With Instance-Level Adaptation CNVC：一个具有实例级自适应的紧凑神经视频编解码器

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-12-03 DOI: 10.1109/TCSVT.2025.3640077

Yue Li;Chaoyi Lin;Junru Li;Kai Zhang;Li Zhang

{"title":"CNVC: A Compact Neural Video Codec With Instance-Level Adaptation","authors":"Yue Li;Chaoyi Lin;Junru Li;Kai Zhang;Li Zhang","doi":"10.1109/TCSVT.2025.3640077","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3640077","url":null,"abstract":"Autoencoder-based neural compression methods leverage expressive models to fit large datasets but often incur considerable decoding complexity. Recently, overfitted codecs with reduced decoding complexity have gained attention as an alternative. However, they usually require access to entire videos or multiple frames simultaneously for encoding, resulting in substantial system delays. To address these limitations, we propose CNVC, a compact neural video codec that employs instance-level adaptation for efficient and flexible video compression. CNVC is fully overfitted (each frame is optimized independently using up to 45k iterations for maximum performance in this paper), building on the COOL-CHIC video model with substantial architectural and training enhancements. At a decoding complexity of just 1300 MACs per pixel, CNVC provides a more compact solution than previous autoencoder-based and overfitted codecs. Additionally, CNVC inherits the frame-wise overfitting mechanism of COOL-CHIC video, enabling flexible encoding configurations (e.g., low-delay). In terms of compression efficiency, CNVC achieves significant bitrate reductions on HEVC and UVG datasets compared to COOL-CHIC video. To our knowledge, CNVC is the first compact neural codec to match HEVC (x265 slow setting) performance at such a decoding complexity level.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5525-5537"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

StableV2V: Stabilizing Shape Consistency in Video-to-Video Editing StableV2V：在视频到视频编辑稳定形状一致性

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-12-02 DOI: 10.1109/TCSVT.2025.3639307

Chang Liu;Rui Li;Kaidong Zhang;Yunwei Lan;Dong Liu

{"title":"StableV2V: Stabilizing Shape Consistency in Video-to-Video Editing","authors":"Chang Liu;Rui Li;Kaidong Zhang;Yunwei Lan;Dong Liu","doi":"10.1109/TCSVT.2025.3639307","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3639307","url":null,"abstract":"Recent advancements in generative artificial intelligence have significantly promoted content creation and editing, where prevailing studies further extend this exciting progress to video editing. These studies mainly transfer the inherent motion patterns from the source videos to the edited ones, where they often produce inferior results with inconsistency to user intentions, especially when shape changes between the edited and original objects might occur, due to the lack of particular alignments between the delivered motions and edited content. To address this limitation, we present a shape-consistent video editing method, namely StableV2V. Our method decomposes the entire editing pipeline into several sequential procedures, where we first edit the initial video frame, then simulate the shape-aware alignment between the delivered motions and edited sequence, and propagate the edited content to all other frames based on such alignment. Furthermore, we curate a testing benchmark, namely DAVIS-Edit, to offer a comprehensive evaluation of video editing, considering various types of prompts and difficulties. Experimental results and analyses illustrate the superior performance, visual consistency, and inference efficiency of our proposed method compared to existing state-of-the-art video editing studies.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5467-5482"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GSCodec Studio: A Modular Framework for Gaussian Splat Compression GSCodec Studio：高斯碎片压缩的模块化框架

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2026-01-15 DOI: 10.1109/TCSVT.2026.3654794

Sicheng Li;Chengzhen Wu;Hao Li;Xiang Gao;Yiyi Liao;Lu Yu

{"title":"GSCodec Studio: A Modular Framework for Gaussian Splat Compression","authors":"Sicheng Li;Chengzhen Wu;Hao Li;Xiang Gao;Yiyi Liao;Lu Yu","doi":"10.1109/TCSVT.2026.3654794","DOIUrl":"https://doi.org/10.1109/TCSVT.2026.3654794","url":null,"abstract":"3D Gaussian Splatting and its extension to 4D dynamic scenes enable photorealistic, real-time rendering from real-world captures, positioning Gaussian Splats (GS) as a promising format for next-generation immersive media. However, their high storage requirements pose significant challenges for practical use in sharing, transmission, and storage. Despite various studies exploring GS compression from different perspectives, these efforts remain scattered across separate repositories, complicating benchmarking and the integration of best practices. To address this gap, we present GSCodec Studio, a unified and modular framework for GS reconstruction, compression, and rendering. The framework incorporates a diverse set of 3D/4D GS reconstruction methods and GS compression techniques as modular components, facilitating flexible combinations and comprehensive comparisons. By integrating best practices from community research and our own explorations, GSCodec Studio supports the development of compact representation and compression solutions for static and dynamic Gaussian Splats. Specifically, we present Static and Dynamic GSCodec: Static GSCodec achieves competitive 3D Gaussian Splat rate-distortion performance with low decoding complexity, while Dynamic GSCodec delivers advanced 4D Gaussian Splat compression performance. The code for our framework is publicly available at <uri>https://github.com/JasonLSC/GSCodec_Studio</uri>, to advance the research on Gaussian Splats compression.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5483-5496"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MoAnimate: Bridging the Motion-Oriented Latent Representation Gaps in Human Video Animation MoAnimate：桥接人类视频动画中面向运动的潜在表示差距

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-12-01 DOI: 10.1109/TCSVT.2025.3639082

Haipeng Fang;Sheng Tang;Zhihao Sun;Ziyao Huang;Juan Cao;Fan Tang;Yongdong Zhang

{"title":"MoAnimate: Bridging the Motion-Oriented Latent Representation Gaps in Human Video Animation","authors":"Haipeng Fang;Sheng Tang;Zhihao Sun;Ziyao Huang;Juan Cao;Fan Tang;Yongdong Zhang","doi":"10.1109/TCSVT.2025.3639082","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3639082","url":null,"abstract":"Human animation strives to bring static characters to life. Existing methods produce high-quality outcomes for single-frame animation; however, they often fail to maintain satisfactory temporal consistency, especially in facial and hand movements. This limitation arises from commonly used motion modules that do not explicitly model inter-entity relationships. In this work, we introduce MoAnimate, a Motion-oriented Human Animation framework designed to improve inter-entity consistency. Specifically, we extract motion flows from driving videos and transfer them to align the shape of character. During initialization, we propose a motion-oriented latent refinement that optimizes low-frequency subbands to regulate the layout of visual objects along flow trajectories, while preserving random high-frequency subbands to accommodate appearance variations. During denoising, we further introduce a motion-oriented entity attention module to enable direct and efficient interaction among entities within a coordinated subspace. Extensive experiments demonstrate that our method significantly enhances temporal consistency, particularly the visual consistency of the entities.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5453-5466"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GIANT: Generated Image Adversarial Steganography Based on Narrowed Targeting GIANT：基于缩小目标的生成图像对抗隐写术

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-11-06 DOI: 10.1109/TCSVT.2025.3629825

Zexin Fan;Kejiang Chen;Yaofei Wang;Weiming Zhang;Nenghai Yu

{"title":"GIANT: Generated Image Adversarial Steganography Based on Narrowed Targeting","authors":"Zexin Fan;Kejiang Chen;Yaofei Wang;Weiming Zhang;Nenghai Yu","doi":"10.1109/TCSVT.2025.3629825","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3629825","url":null,"abstract":"With the rapid emergence of various generative models, generated images have increasingly become a prominent data medium on social platforms, making up a significantly higher proportion and providing fertile ground for steganography. However, research on steganography for generated images remains limited, and the distinctive attributes, especially the reproducibility of text-to-image (TTI) models, have not been effectively leveraged. In this paper, we propose GIANT (Generated Image Adversarial steganography based on Narrowed Targeting), a novel adversarial steganography framework for generated images that employs narrowed targeting to focus on embedding the secret message solely in the secure region and synchronizing the position to enhance the steganography security. GIANT achieves narrowed targeting by leveraging the reproducibility of TTI models and fusing two regions: 1) the minimal distortion region, which is localized by measuring steganographic distortion to evaluate the impact of modifications on the cover image distribution, and 2) the critical attention region, which is localized by using coarse-grained and fine-grained attention maps to evade steganalysis detection. Additionally, for positional synchronization of the secure region, the related prompts are transmitted alongside the stego image, allowing the receiver to reconstruct the cover image using a shared key and the provided prompt. Experimental results demonstrate that GIANT significantly improves security compared to conventional and adversarial steganographic methods designed for natural images, effectively countering state-of-the-art steganalyzers.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5671-5682"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nearest Neighbor Sample Constraint and ODE Guided Feature Reconstruction for Unsupervised Person Re-Identification 无监督人员再识别的最近邻样本约束和ODE引导特征重构

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2026-04-01 Epub Date: 2025-11-13 DOI: 10.1109/TCSVT.2025.3632324

Xi Yang;Wenjiao Dong;Gu Zheng;Nannan Wang

{"title":"Nearest Neighbor Sample Constraint and ODE Guided Feature Reconstruction for Unsupervised Person Re-Identification","authors":"Xi Yang;Wenjiao Dong;Gu Zheng;Nannan Wang","doi":"10.1109/TCSVT.2025.3632324","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3632324","url":null,"abstract":"Unsupervised person re-identification aims to retrieve a given pedestrian image from unlabeled data. The method of clustering and assigning pseudo-labels has become mainstream, but there are still some problems that will reduce recognition accuracy. On the one hand, in the process of clustering, poor classification of hard samples between neighboring classes leads to inadequate clustering accuracy, which affects the quality of pseudo-labels. On the other hand, the representational capacity of features extracted by the backbone network is also crucial for the model’s performance. To this end, this paper proposes an unsupervised person re-identification method based on nearest neighbor sample constraint and ordinary differential equation guided feature reconstruction (NNSC-FR) to improve the clustering accuracy and pseudo-label quality while enhancing the representation of features. Specifically, we present a novel nearest neighbor sample constraint (NNSC) after neighbor sample mining for each instance sample to recognize the hard samples’ fine classification between classes. To further improve clustering accuracy, an inter-class balance loss (CB loss) is introduced to better identify the hard samples between the nearest neighbor classes. In addition, guided by the third-order adam solution of the Ordinary Differential Equation, we design a Feature Reconstruction (ODE-FR) module with residual structure to improve the model representation ability. Extensive experimental results on Market-1501, DukeMTMC-reID, and MSMT17 demonstrate that our proposed method is superior to the state-of-the-art methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"36 4","pages":"5608-5620"},"PeriodicalIF":11.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147620922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0