Xiao Cui;Weicai Ye;Yifan Wang;Guofeng Zhang;Wengang Zhou;Tong He;Houqiang Li
{"title":"StreetSurfGS: Scalable Urban Street Surface Reconstruction With Planar-Based Gaussian Splatting","authors":"Xiao Cui;Weicai Ye;Yifan Wang;Guofeng Zhang;Wengang Zhou;Tong He;Houqiang Li","doi":"10.1109/TCSVT.2025.3551719","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551719","url":null,"abstract":"Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long, narrow camera trajectories, occlusion, complex object relationships, and sparse data across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and improve scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8780-8793"},"PeriodicalIF":11.1,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Task Guided No-Reference Omnidirectional Image Quality Assessment With Feature Interaction","authors":"Yun Liu;Sifan Li;Huiyu Duan;Yu Zhou;Daoxin Fan;Guangtao Zhai","doi":"10.1109/TCSVT.2025.3551723","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551723","url":null,"abstract":"Omnidirectional image quality assessment (OIQA) has become an increasingly vital problem in recent years. Most previous no-reference OIQA methods only extract local features from the distorted viewports, or extract global features from the entire distorted image, lacking the interaction and fusion between local and global features. Moreover, the lack of reference information also limits their performance. Thus, we propose a no-reference OIQA model which consists of three novel modules, including a bidirectional pseudo-reference module, a Mamba-based global feature extraction module, and a multi-scale local-global feature aggregation module. Specifically, by considering the image distortion degradation process, a bidirectional pseudo-reference module capturing the error maps on viewports is first constructed to refine the multi-scale local visual features, which can supply rich quality degradation reference information without the reference image. To well complement the local features, the VMamba module is adopted to extract the representative multi-scale global visual features. Inspired by human hierarchical visual perception characteristics, a novel multi-scale aggregation module is built to strengthen the feature interaction and effective fusion which can extract deep semantic information. Finally, motivated by the multi-task managing mechanism of human brain, a multi-task learning module is introduced to assist the main quality assessment task by digging the hidden information in compression type and distortion degree. Extensive experimental results demonstrate that our proposed method achieves the state-of-the-art performance on the no-reference OIQA task compared to other models.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8794-8806"},"PeriodicalIF":11.1,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LOCAT: Localization-Driven Text Watermarking via Large Language Models","authors":"Liang Ding;Xi Yang;Yang Yang;Weiming Zhang","doi":"10.1109/TCSVT.2025.3570858","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570858","url":null,"abstract":"The rapid advancement of large language models (LLMs) has raised concerns regarding potential misuse and underscores the importance of verifying text authenticity. Text watermarking, which embeds covert identifiers into generated content, offers a viable means for such verification. Such watermarking can be implemented either by modifying the generation process of an LLM or via post-processing techniques like lexical substitution, with the latter being particularly valuable when access to model parameters is restricted. However, existing lexical substitution-based methods often face a trade-off between maintaining text quality and ensuring robust watermarking. Addressing this limitation, our work focuses on enhancing both the robustness and imperceptibility of text watermarks within the lexical substitution paradigm. We propose a localization-based watermarking method that enhances robustness while maintaining text naturalness. First, a precise localization module identifies optimal substitution targets. Then, we leverage LLMs to generate contextually appropriate synonyms, and the watermark is embedded through binary-encoded substitutions. To address different usage scenarios, we focus on the trade-off between watermark robustness and text quality. Compared to existing methods, our approach significantly enhances watermark robustness while maintaining comparable text quality and achieves similar robustness levels while improving text quality. Even under severe semantic distortions, including word deletion, synonym substitution, polishing, and re-translation, the watermark remains detectable.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8406-8420"},"PeriodicalIF":11.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Errata to “Local-Global Temporal Difference Learning for Satellite Video Super-Resolution”","authors":"Yi Xiao;Qiangqiang Yuan","doi":"10.1109/TCSVT.2025.3570842","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570842","url":null,"abstract":"In the above article [1], there is a citation error related to the core technical foundation of the proposed method. Reference [2] was incorrectly cited. The correct citation is [3].","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10612-10612"},"PeriodicalIF":11.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11006141","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CamStegNet: A Robust Image Steganography Method Based on Camouflage Model","authors":"Le Mao;Yun Tan;Jiaohua Qin;Xuyu Xiang","doi":"10.1109/TCSVT.2025.3570725","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570725","url":null,"abstract":"Deep learning models are increasingly being employed in steganographic schemes for the embedding and extraction of secret information. However, steganographic models themselves are also at risk of detection and attacks. Although there are approaches proposed to hide deep learning models, making these models difficult to detect while achieving high-quality image steganography performance remains a challenging task. In this work, a robust image steganography method based on a camouflage model CamStegNet is proposed. The steganographic model is camouflaged as a routine deep learning model to significantly enhance its concealment. A sparse weight-filling paradigm is designed to enable the model to be flexibly switched among three modes by utilizing different keys: routine machine learning task, secret embedding task and secret recovery task. Furthermore, a residual state-space module and a neighborhood attention mechanism are constructed to improve the performance of image steganography. Experiments conducted on the DIV2K, ImageNet and COCO datasets demonstrate that the stego images generated by CamStegNet are superior to existing methods in terms of visual quality. They also exhibit enhanced resistance to steganalysis and maintain over 95% robustness against noise and scale attacks. Additionally, the model demonstrates high robustness which can achieve excellent performance in machine learning tasks and maintain stability across various weight initialization methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10599-10611"},"PeriodicalIF":11.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bangzhen Liu;Yangyang Xu;Cheng Xu;Xuemiao Xu;Shengfeng He
{"title":"Open-Set Mixed Domain Adaptation via Visual-Linguistic Focal Evolving","authors":"Bangzhen Liu;Yangyang Xu;Cheng Xu;Xuemiao Xu;Shengfeng He","doi":"10.1109/TCSVT.2025.3551234","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551234","url":null,"abstract":"We introduce a new task, Open-set Mixed Domain Adaptation (OSMDA), which considers the potential mixture of multiple distributions in the target domains, thereby better simulating real-world scenarios. To tackle the semantic ambiguity arising from multiple domains, our key idea is that the linguistic representation can serve as a universal descriptor for samples of the same category across various domains. We thus propose a more practical framework for cross-domain recognition via visual-linguistic guidance. On the other hand, the presence of multiple domains also poses a new challenge in classifying both known and unknown categories. To combat this issue, we further introduce a visual-linguistic focal evolving approach to gradually enhance the classification ability of a known/unknown binary classifier from two aspects. Specifically, we start with identifying highly confident focal samples to expand the pool of known samples by incorporating those from different domains. Then, we amplify the feature discrepancy between known and unknown samples through dynamic entropy evolving via an adaptive entropies min/max game, enabling us to accurately identify possible unknown samples in a gradual manner. Extensive experiments demonstrate our method’s superiority against the state-of-the-arts in both open-set and open-set mixed domain adaptation.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8495-8507"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rate-Distortion-Optimized Deep Preprocessing for JPEG Compression","authors":"Fan Ye;Bojun Liu;Li Li;Dong Liu","doi":"10.1109/TCSVT.2025.3550872","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550872","url":null,"abstract":"JPEG is daily used for compressing natural images, while the compressed images often contain visually annoying artifacts especially at low rates. To reduce the compression artifacts, it has been proposed to preprocess an image before the JPEG compression with the help of deep learning, which maintains the standard compliance. However, the existing methods were not fully justified from the rate-distortion optimization perspective. We address this limitation and propose a truly rate-distortion-optimized deep preprocessing method for JPEG compression. We decompose a rate-distortion cost into three parts: rate, distortion, and Lagrangian multiplier. First, we design a rate estimation network and propose to train the network to estimate the JPEG compression rate. Second, we propose to estimate the actual end-to-end distortion (between original and reconstructed images) with a differentiable JPEG simulator, where we specifically design an adaptive discrete cosine transform (DCT) domain masking algorithm. Third, we propose to estimate the actual content-dependent Lagrangian multipliers to combine rate and distortion into a joint loss function that drives the training of the preprocessing network. Our method makes no change to the JPEG encoder and decoder and supports any differentiable distortion measure (e.g. MSE, MS-SSIM, LPIPS). On the Kodak dataset, our method achieves on average 7.59% BD-rate reduction compared to the JPEG baseline when using MSE. With per-image optimization for LPIPS, our method achieves as high as 38.65% BD-rate reduction, and produces high-quality reconstructed images with much less artifacts.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8330-8343"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class-Aware Prompting for Federated Few-Shot Class-Incremental Learning","authors":"Fang-Yi Liang;Yu-Wei Zhan;Jiale Liu;Chong-Yu Zhang;Zhen-Duo Chen;Xin Luo;Xin-Shun Xu","doi":"10.1109/TCSVT.2025.3551612","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551612","url":null,"abstract":"Few-Shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes from limited samples while preventing catastrophic forgetting. With the increasing distribution of learning data across different clients and privacy concerns, FSCIL faces a more realistic scenario where few learning samples are distributed across different clients, thereby necessitating a Federated Few-Shot Class-Incremental Learning (FedFSCIL) scenario. However, this integration faces challenges from non-IID problem, which affects model generalization and training efficiency. The communication overhead in federated settings also presents a significant challenge. To address these issues, we propose Class-Aware Prompting for Federated Few-Shot Class-Incremental Learning (FedCAP). Our framework leverages pre-trained models enhanced by a class-wise prompt pool, where shared class-wise keys enable clients to utilize global class information during training. This unifies the understanding of base class features across clients and enhances model consistency. We further incorporate a class-level information fusion module to improve class representation and model generalization. Our approach requires very few parameter transmission during model aggregation, ensuring communication efficiency. To our knowledge, this is the first study to explore the scenario of FedFSCIL. Consequently, we designed comprehensive experimental setups and made the code publicly available.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8520-8532"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nuo Chen;Chushu Zhang;Wei An;Longguang Wang;Miao Li;Qiang Ling
{"title":"Event-Based Motion Deblurring With Blur-Aware Reconstruction Filter","authors":"Nuo Chen;Chushu Zhang;Wei An;Longguang Wang;Miao Li;Qiang Ling","doi":"10.1109/TCSVT.2025.3551516","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551516","url":null,"abstract":"Event-based motion deblurring aims at reconstructing a sharp image from a single blurry image and its corresponding events triggered during the exposure time. Existing methods learn the spatial distribution of blur from blurred images, then treat events as temporal residuals and learn blurred temporal features from them, and finally restore clear images through spatio-temporal interaction of the two features. However, due to the high coupling of detailed features such as the texture and contour of the scene with blur features, it is difficult to directly learn effective blur spatial distribution from the original blurred image. In this paper, we provide a novel perspective, i.e., employing the blur indication provided by events, to instruct the network in spatially differentiated image reconstruction. Due to the consistency between event spatial distribution and image blur, event spatial indication can learn blur spatial features more simply and directly, and serve as a complement to temporal residual guidance to improve deblurring performance. Based on the above insight, we propose an event-based motion deblurring network consisting of a Multi-Scale Event-based Double Integral (MS-EDI) module designed from temporal residual guidance, and a Blur-Aware Filter Prediction (BAFP) module to conduct filter processing directed by spatial blur indication. The network, after incorporating spatial residual guidance, has significantly enhanced its generalization ability, surpassing the best-performing image-based and event-based methods on both synthetic, semi-synthetic, and real-world datasets. In addition, our method can be extended to blurry image super-resolution and achieves impressive performance. Our code is available at: <uri>https://github.com/ChenYichen9527/MBNet</uri> now.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8508-8519"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generic Objects as Pose Probes for Few-Shot View Synthesis","authors":"Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu","doi":"10.1109/TCSVT.2025.3551303","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551303","url":null,"abstract":"Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: <uri>https://zhirui-gao.github.io/PoseProbe.github.io/</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9046-9059"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}