{"title":"IEEE Circuits and Systems Society Information","authors":"","doi":"10.1109/TCSVT.2025.3525919","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3525919","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"C3-C3"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10857833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143369913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Circuits and Systems for Video Technology Publication Information","authors":"","doi":"10.1109/TCSVT.2025.3525917","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3525917","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"C2-C2"},"PeriodicalIF":8.3,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10857832","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2024 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 34","authors":"","doi":"10.1109/TCSVT.2025.3528698","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3528698","url":null,"abstract":"","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13655-13831"},"PeriodicalIF":8.3,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840335","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142975850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When Aware Haze Density Meets Diffusion Model for Synthetic-to-Real Dehazing","authors":"Shibai Yin;Yiwei Shi;Yibin Wang;Yee-Hong Yang","doi":"10.1109/TCSVT.2024.3520816","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3520816","url":null,"abstract":"Image dehazing is an important preliminary step for downstream vision tasks. Existing deep learning-based methods have limited generalization capabilities for real hazy images because they are trained on synthetic data and exhibit high domain-specific properties. This work proposes a new Diffusion Model for Synthetic-to-Real dehazing (DMSR) based on the haze-aware density. DMSR mainly comprises of a physics-based dehazing model and a Conditional Denoising Diffusion Model (CDDM)-based model. The coarse transmission map and coarse dehazing result estimated by the physics-based dehazing model serve as conditions for the subsequent CDDM-based model. In this process, the CDDM-based dehazing model progressively refines the coarse transmission map while generating the dehazing result, enabling the model to remove haze with accurate haze density information. Next, we propose a haze density-aware resampling strategy that incorporates the coarse dehazed result into the resampling process using the transmission map, thereby fully leveraging the diffusion model for heavy haze removal. Moreover, a new synthetic-to-real training strategy with the prior-based loss function and the memory loss function is applied to DMSR for improving generalization capabilities and narrowing the gap between the synthetic and real domains with low computational cost. Extensive experiments on various real datasets demonstrate the effectiveness and superiority of the proposed DMSR over state-of-the-art methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4242-4255"},"PeriodicalIF":8.3,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting High-Discriminative Features for Detecting Double JPEG Compression With the Same Quantization Matrix","authors":"Wenjie Li;Xiaolong Li;Rongrong Ni;Yao Zhao","doi":"10.1109/TCSVT.2025.3526838","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3526838","url":null,"abstract":"Detecting double JPEG compression with the same quantization matrix is a crucial yet challenging task in image forensics. Existing methods often fail to accurately identify and fully exploit the differences between singly and doubly compressed images, resulting in unsatisfactory detection performance, especially for cases with low quality factors (QFs). To address this issue, a novel method is proposed to extract highly discriminative features for performance enhancement. First, we design a new error block classification method that categorizes error blocks into stable error blocks, rounding error blocks (REBs), and truncation error blocks (TEBs). This classification method enables more accurate identification of TEBs, which are the most discriminative blocks in error images for cases with low QFs. Then, based on the theoretical analysis of REBs and TEBs, an intrinsic variable that directly leads to the differences between two classes of images is derived, providing more essential characteristics for the detection. Finally, a number of 25-dimensional highly discriminative features are extracted from REBs, TEBs, and flat blocks. Experimental results demonstrate that the proposed method outperforms several state-of-the-art works, especially on images with low QFs.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4727-4739"},"PeriodicalIF":8.3,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Unfolding Network for Image Desnowing With Snow Shape Prior","authors":"Xin Guo;Xi Wang;Xueyang Fu;Zheng-Jun Zha","doi":"10.1109/TCSVT.2025.3526647","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3526647","url":null,"abstract":"Effectively leveraging snow image formulation, which accounts for atmospheric light and snow masks, is crucial for enhancing image desnowing performance and improving interpretability. However, current direct-learning approaches often neglect this formulation, while model-based methods use it in overly simplistic ways. To address this, we propose a novel unfolding network that iteratively refines the desnowing process for more thorough optimization. Additionally, model-based techniques usually rely on real-world snow masks for supervision, a requirement that is impractical in many real-world applications. To overcome this limitation, we introduce a snow shape prior as a surrogate supervision signal. We further integrate the physical properties of atmospheric light and heavy snow by decomposing the optimization task into manageable sub-problems within our unfolding network. Extensive evaluations on multiple benchmark datasets confirm that our method outperforms current state-of-the-art techniques.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4740-4752"},"PeriodicalIF":8.3,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visible-Infrared Person Re-Identification With Real-World Label Noise","authors":"Ruiheng Zhang;Zhe Cao;Yan Huang;Shuo Yang;Lixin Xu;Min Xu","doi":"10.1109/TCSVT.2025.3526449","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3526449","url":null,"abstract":"In recent years, growing needs for advanced security and traffic management have significantly heightened the prominence of the visible-infrared person re-identification community (VI-ReID), garnering considerable attention. A critical challenge in VI-ReID is the performance degradation attributable to label noise, an issue that becomes even more pronounced in cross-modal scenarios due to an increased likelihood of data confusion. While previous methods have achieved notable successes, they often overlook the complexities of instance-dependent and real-world noise, creating a disconnect from the practical applications of person re-identification. To bridge this gap, our research analyzes the primary sources of label noise in real-world settings, which include a) instantiated identities, b) blurry infrared images, and c) annotators’ errors. In response to these challenges, we develop a Robust Hybrid Loss function (RHL) that enables targeted recognition and retrieval optimization through a more fine-grained division of the noisy dataset. The proposed method categorises data into three sets: clean, obviously noisy, and indistinguishably noisy, with bespoke loss calculations for each category. The identification loss is structured to address the varied nature of these sets specifically. For the retrieval sub-task, we utilize an enhanced triplet loss, adept at handling noisy correspondences. Furthermore, to empirically validate our method, we have re-annotated a real-world dataset, SYSU-Real. Our experiments on SYSU-MM01 and RegDB, conducted under various noise ratios of random and instance-dependent label noise, demonstrate the generalized robustness and effectiveness of our proposed approach.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4857-4869"},"PeriodicalIF":8.3,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Mask-Based Adaptive Robust Training for Video Object Segmentation With Noisy Labels","authors":"Enki Cho;Jung Uk Kim;Seong Tae Kim","doi":"10.1109/TCSVT.2025.3525629","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3525629","url":null,"abstract":"Recent advances in video object segmentation (VOS) highlight its potential across various applications. Semi-supervised VOS aims to segment target objects in video frames based on annotations from the initial frame. Collecting a large-scale video segmentation dataset is challenging, which could induce noisy labels. However, it has been overlooked and most of the research efforts have been devoted to training VOS models by assuming the training dataset is clean. In this study, we first explore the effect of VOS models under noisy labels in the training dataset. To investigate the effect of noisy labels, we simulate the noisy annotations on DAVIS 2017 and YouTubeVOS datasets. Experiments show that the traditional training strategy is vulnerable to noisy annotations. To address this issue, we propose a novel noise-robust training method, named SMART (Spatial Mask-based Adaptive Robust Training), which is designed to train models effectively in the presence of noisy annotations. The proposed method employs two key strategies. Firstly, the model focuses on the common spatial areas from clean knowledge-based predictions and annotations. Secondly, the model is trained with adaptive balancing losses based on their reliability. Comparative experiments have demonstrated the effectiveness of our approach by outperforming other noise handling methods over various noise degrees.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4976-4990"},"PeriodicalIF":8.3,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaojiao Guo;Xuhang Chen;Shuqiang Wang;Chi-Man Pun
{"title":"Underwater Image Restoration Through a Prior Guided Hybrid Sense Approach and Extensive Benchmark Analysis","authors":"Xiaojiao Guo;Xuhang Chen;Shuqiang Wang;Chi-Man Pun","doi":"10.1109/TCSVT.2025.3525593","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3525593","url":null,"abstract":"Underwater imaging grapples with challenges from light-water interactions, leading to color distortions and reduced clarity. In response to these challenges, we propose a novel Color Balance Prior Guided Hybrid Sense Underwater Image Restoration framework (GuidedHybSensUIR). This framework operates on multiple scales, employing the proposed Detail Restorer module to restore low-level detailed features at finer scales and utilizing the proposed Feature Contextualizer module to capture long-range contextual relations of high-level general features at a broader scale. The hybridization of these different scales of sensing results effectively addresses color casts and restores blurry details. In order to effectively point out the evolutionary direction for the model, we propose a novel Color Balance Prior as a strong guide in the feature contextualization step and as a weak guide in the final decoding phase. We construct a comprehensive benchmark using paired training data from three real-world underwater datasets and evaluate on six test sets, including three paired and three unpaired, sourced from four real-world underwater datasets. Subsequently, we tested 14 traditional and retrained 23 deep learning existing underwater image restoration methods on this benchmark, obtaining metric results for each approach. This effort aims to furnish a valuable benchmarking dataset for standard basis for comparison. The extensive experiment results demonstrate that our method outperforms 37 other state-of-the-art methods overall on various benchmark datasets and metrics, despite not achieving the best results in certain individual cases. The code and dataset are available at <uri>https://github.com/CXH-Research/GuidedHybSensUIR</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4784-4800"},"PeriodicalIF":8.3,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Li;Fang Liu;Licheng Jiao;Lingling Li;Puhua Chen;Xu Liu;Wenping Ma
{"title":"Prompt-Based Concept Learning for Few-Shot Class-Incremental Learning","authors":"Shuo Li;Fang Liu;Licheng Jiao;Lingling Li;Puhua Chen;Xu Liu;Wenping Ma","doi":"10.1109/TCSVT.2025.3525545","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3525545","url":null,"abstract":"Few-Shot Class-Incremental Learning (FSCIL) faces a huge stability-plasticity challenge due to continuously learning knowledge from new classes with a small number of training samples without forgetting the knowledge of previously seen old classes. To alleviate this challenge, we propose a novel method called Prompt-based Concept Learning (PCL) for FSCIL, which generalizes conceptual knowledge learned from old classes to new classes by simulating human learning capabilities. In our PCL, in the base session, we simultaneously learn common basic concepts from the training data and the class-concept weight of each class in a prompt learning manner, and in each incremental session, class-concept weights between new classes and previously learned basic concepts are learned to achieve incremental learning. Furthermore, in order to avoid catastrophic forgetting, we propose a distribution estimation module to retain feature distributions of previously seen classes and a data replay module to randomly sample features of previously seen classes in incremental sessions. We verify the effectiveness of our PCL on widely used benchmarks, such as miniImageNet, CIFAR-100, and CUB-200. Experimental results show that our PCL achieves competitive results compared with other state-of-the-art methods, especially we achieve an average accuracy of 94.02% across all sessions on the miniImageNet benchmark.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4991-5005"},"PeriodicalIF":8.3,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}