IEEE Transactions on Circuits and Systems for Video Technology最新文献_第10页

FasterCReW: Performance or Efficiency? A Lightweight Conditional Residual DNN-Based Watermarking Based on FasterNet fastcrew：性能还是效率？一种基于fastnet的轻量级条件残差dnn水印

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3550908

Baowei Wang;Jianbo Zhang;Yufeng Wu;Qi Cui

{"title":"FasterCReW: Performance or Efficiency? A Lightweight Conditional Residual DNN-Based Watermarking Based on FasterNet","authors":"Baowei Wang;Jianbo Zhang;Yufeng Wu;Qi Cui","doi":"10.1109/TCSVT.2025.3550908","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550908","url":null,"abstract":"Deep neural networks (DNNs) based watermarking algorithms have made significant strides in recent years. However, existing methods either demand substantial resources for image feature extraction during watermark embedding, sacrificing efficiency, or completely neglect image texture information, resulting in suboptimal performance. Moreover, current algorithms struggle with real-time watermark extraction. To address these limitations, we propose a lightweight conditional residual watermarking (CReW) architecture. Specifically, CReW employs a Conditional Generative Adversarial Network (CGAN) framework to generate an adaptive residual image guided by the structure of the cover image, which is decoupled from the network to reduce computational complexity. This design enables CReW to achieve an optimal balance between performance and efficiency. Additionally, by directly optimizing the residual image to capture variations in watermark behavior under distortion, CReW significantly enhances robustness. Furthermore, we design redundancy coding blocks to increase the mutual information of the watermark, along with a patch-level discriminator to improve local patch discrimination, thereby further enhancing image quality. Finally, by reducing channel redundancy and leveraging FasterNet, we developed a low-complexity network architecture, FasterCReW, which facilitates real-time watermark embedding and extraction. Extensive experimental results demonstrate that, despite having <inline-formula> <tex-math>$36 times $ </tex-math></inline-formula> fewer network parameters and <inline-formula> <tex-math>$30times $ </tex-math></inline-formula> fewer floating point operations (FLOPs) than Adaptor, FasterCReW exhibits excellent robustness against distortions such as cropout, JPEG compression, and Gaussian noise. Furthermore, FasterCReW significantly outperforms other existing DNN-based watermarking algorithms in terms of running speed, achieving an <inline-formula> <tex-math>$8times $ </tex-math></inline-formula> speed increase over UDH and a <inline-formula> <tex-math>$28times $ </tex-math></inline-formula> increase over Adaptor on an Intel Core i7-8750H CPU.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"7732-7746"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Discriminative Boundary Hashing for Cross-Modal Retrieval 跨模态检索的深度判别边界哈希

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI: 10.1109/TCSVT.2025.3570128

Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie

{"title":"Deep Discriminative Boundary Hashing for Cross-Modal Retrieval","authors":"Qibing Qin;Yadong Huo;Wenfeng Zhang;Lei Huang;Jie Nie","doi":"10.1109/TCSVT.2025.3570128","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3570128","url":null,"abstract":"By the preferable efficiency in storage and computation, deep cross-modal has gained much attention in large-scale multimedia retrieval. Current deep hashing employs the probability outputs of the likelihood function, i.e., Sigmoid or Cauchy, to quantify the semantic similarity between samples in a common Hamming space. However, the inherent weakness of the Sigmoid likelihood function or the Cauchy likelihood function in gradient optimization leads to hashing models failing to exactly describe the hamming ball, which indicates the absolute semantic boundary among classes, thereby giving the high neighborhood ambiguity. In this paper, with the analysis of the likelihood function from the perspective of similarity metric learning, the novel Deep Discriminative Boundary Hashing framework (DDBH) is proposed to learn the discriminative embedding space that separates neighbors and non-neighbors well. Specifically, by introducing the remapping strategy and the base-point adaptive selection, the boundary-preserving loss based on the adjustable likelihood function is proposed to project data points with small gradients to regions with large gradients and give larger gradients for hard samples, facilitating better separation among classes. Meanwhile, to learn class-dependent binary codes, the class-wise quantization loss is designed to heuristically transfer class-wise prior knowledge to the binary quantization, significantly improving the discriminative capability of compact discrete codes. Comprehensive experiments on three benchmark datasets show that our proposed DDBH framework outperforms other representative deep cross-modal hashing. The corresponding code is available at <uri>https://github.com/QinLab-WFU/DDBH</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10557-10570"},"PeriodicalIF":11.1,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Few-Shot In-Context Learning for Implicit Semantic Multimodal Content Detection and Interpretation 基于上下文学习的隐式语义多模态内容检测与解释

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-13 DOI: 10.1109/TCSVT.2025.3550900

Xiuxian Wang;Lanjun Wang;Yuting Su;Hongshuo Tian;Guoqing Jin;An-An Liu

{"title":"Few-Shot In-Context Learning for Implicit Semantic Multimodal Content Detection and Interpretation","authors":"Xiuxian Wang;Lanjun Wang;Yuting Su;Hongshuo Tian;Guoqing Jin;An-An Liu","doi":"10.1109/TCSVT.2025.3550900","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550900","url":null,"abstract":"In recent years, the field of explicit semantic multimodal content research makes significant progress. However, research on content with implicit semantics, such as online memes, remains insufficient. Memes often convey implicit semantics through metaphors and may sometimes contain hateful information. To address this issue, researchers propose a task for detecting hateful memes, opening up new avenues for exploring implicit semantics. The hateful meme detection currently faces two main problems: 1) the rapid emergence of meme content makes continuous tracking and detection difficult; 2) current methods often lack interpretability, which limits the understanding and trust in the detection results. To make a better understanding of memes, we analyze the definition of metaphor from social science and identify the three key factors of metaphor: socio-cultural knowledge, metaphorical tenor, and metaphorical representation pattern. According to these key factors, we guide a multimodal large language model (MLLM) to infer the metaphors expressed in memes step by step. Particularly, we propose a hateful meme detection and interpretation framework, which has four modules. We first leverage a multimodal generative search method to obtain socio-cultural knowledge relevant to visual objects of memes. Then, we use socio-cultural knowledge to instruct the MLLM to assess the social-cultural relevance scores between visual objects and textual information, and identify the metaphorical tenor of memes. Meanwhile, we apply a representative interpretation method to provide representative cases of memes and analyze these cases to explore metaphorical representation pattern. Finally, a chain-of-thought prompt is constructed to integrate the output of the above modules, guiding the MLLM to accurately detect and interpret hateful memes. Our method achieves state-of-the-art performance on three hateful meme detection benchmarks and performs better than supervised training models on the hateful meme interpretation benchmark.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9545-9558"},"PeriodicalIF":11.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PRFormer: Matching Proposal and Reference Masks by Semantic and Spatial Similarity for Few-Shot Semantic Segmentation PRFormer：基于语义和空间相似性的少镜头语义分割建议和参考掩码匹配

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-13 DOI: 10.1109/TCSVT.2025.3550879

Guangyu Gao;Anqi Zhang;Jianbo Jiao;Chi Harold Liu;Yunchao Wei

{"title":"PRFormer: Matching Proposal and Reference Masks by Semantic and Spatial Similarity for Few-Shot Semantic Segmentation","authors":"Guangyu Gao;Anqi Zhang;Jianbo Jiao;Chi Harold Liu;Yunchao Wei","doi":"10.1109/TCSVT.2025.3550879","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550879","url":null,"abstract":"Few-shot Semantic Segmentation (FSS) aims to accurately segment query images with guidance from only a few annotated support images. Previous methods typically rely on pixel-level feature correlations, denoted as the many-to-many (pixels-to-pixels) or few-to-many (prototype-to-pixels) manners. Recent mask proposals classification pipeline in semantic segmentation enables more efficient few-to-few (prototype-to-prototype) correlation between masks of query proposals and support reference. However, these methods still involve intermediate pixel-level feature correlation, resulting in lower efficiency. In this paper, we introduce the Proposal and Reference masks matching transFormer (PRFormer), designed to rigorously address mask matching in both spatial and semantic aspects in a thorough few-to-few manner. Following the mask-classification paradigm, PRFormer starts with a class-agnostic proposal generator to partition the query image into proposal masks. It then evaluates the features corresponding to query proposal masks and support reference masks using two strategies: semantic matching based on feature similarity across prototypes and spatial matching through mask intersection ratio. These strategies are implemented as the Prototype Contrastive Correlation (PrCC) and Prior-Proposals Intersection (PPI) modules, respectively. These strategies enhance matching precision and efficiency while eliminating dependence on pixel-level feature correlations. Additionally, we propose the category discrimination NCE (cdNCE) loss and IoU-KLD loss to constrain the adapted prototypes and align the similarity vector with the corresponding IoU between proposals and ground truth. Given that class-agnostic proposals tend to be more accurate for training classes than for novel classes in FSS, we introduce the Weighted Proposal Refinement (WPR) to refine the most confident masks with detailed features, yielding more precise predictions. Experiments on the popular Pascal-5i and COCO-20i benchmarks show that our Few-to-Few approach, PRFormer, outperforms previous methods, achieving mIoU scores of 70.4% and 49.4%, respectively, on 1-shot segmentation. Code is available at <uri>https://github.com/ANDYZAQ/PRFormer</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8161-8173"},"PeriodicalIF":11.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MV-CLIP: Multi-View CLIP for Zero-Shot 3D Shape Recognition MV-CLIP：多视图剪辑零射击3D形状识别

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-13 DOI: 10.1109/TCSVT.2025.3551084

Dan Song;Xinwei Fu;Ning Liu;Wei-Zhi Nie;Wen-Hui Li;Lan-Jun Wang;You Yang;An-An Liu

{"title":"MV-CLIP: Multi-View CLIP for Zero-Shot 3D Shape Recognition","authors":"Dan Song;Xinwei Fu;Ning Liu;Wei-Zhi Nie;Wen-Hui Li;Lan-Jun Wang;You Yang;An-An Liu","doi":"10.1109/TCSVT.2025.3551084","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551084","url":null,"abstract":"Large-scale pre-trained models have demonstrated impressive performance in vision and language tasks within open-world scenarios. Due to the lack of comparable pre-trained models for 3D shapes, recent methods utilize language-image pre-training to realize zero-shot 3D shape recognition. However, due to the modality gap, pretrained language-image models are not confident enough in the generalization to 3D shape recognition. Consequently, this paper aims to improve the confidence with view selection and hierarchical prompts. Building on the well-established CLIP model, we introduce view selection in the vision side that minimizes entropy to identify the most informative views for 3D shape. On the textual side, hierarchical prompts combined of hand-crafted and GPT-generated prompts are proposed to refine predictions. The first layer prompts several classification candidates with traditional class-level descriptions, while the second layer refines the prediction based on function-level descriptions or further distinctions between the candidates. Extensive experiments demonstrate the effectiveness of the proposed modules for zero-shot 3D shape recognition. Remarkably, without the need for additional training, our proposed method achieves impressive zero-shot 3D classification accuracies of 84.44%, 91.51%, and 66.17% on ModelNet40, ModelNet10, and ShapeNet Core55, respectively. Furthermore, we will make the code publicly available to facilitate reproducibility and further research in this area.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8767-8779"},"PeriodicalIF":11.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Event-Based Video Reconstruction Via Spatial–Temporal Heterogeneous Spiking Neural Network 基于时空异质脉冲神经网络的事件视频重构

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-13 DOI: 10.1109/TCSVT.2025.3550901

Jiajie Yu;Xing Lu;Lijun Guo;Chong Wang;Guoqi Li;Jiangbo Qian

{"title":"Event-Based Video Reconstruction Via Spatial–Temporal Heterogeneous Spiking Neural Network","authors":"Jiajie Yu;Xing Lu;Lijun Guo;Chong Wang;Guoqi Li;Jiangbo Qian","doi":"10.1109/TCSVT.2025.3550901","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550901","url":null,"abstract":"Event cameras detect per-pixel brightness changes and output asynchronous event streams with high temporal resolution, high dynamic range, and low latency. However, the unstructured nature of event streams means that humans cannot analyze and interpret them in the same way as natural images. Event-based video reconstruction is a widely used method aimed at reconstructing intuitive videos from event streams. Most reconstruction methods based on traditional artificial neural networks (ANNs) have high energy consumption, which counteracts the low-power advantage of event cameras. Spiking neural networks (SNNs) are a new generation of event-driven neural networks that encode information via discrete spikes, which leads to greater computational efficiency. Previous methods based on SNNs overlooked the asynchronous nature of event streams, leading to reconstructions that suffer from artifacts, flickering, low contrast, etc. In this work, we analyze event streams and spiking neurons and explain poor reconstruction quality. We specifically propose a novel spatial-temporal heterogeneous (STH) spiking neuron suitable for reconstructing asynchronous event streams. The STH neuron adjusts the membrane decay coefficient adaptively and has better spatiotemporal perception. In addition, we propose a temporal-frequency calibration module (TFCM) based on the Fourier transform to improve the contrast of the reconstructions. On the basis of the above proposed neuron and module, we construct two SNN-based models, referred to as the STHSNN and TFCSNN. The goal of the former is to reduce the artifacts and flickering in reconstructions, whereas the latter focuses on enhancing the contrast. The experimental results demonstrate that our models can yield reconstructions in various scenarios, achieving better quality and lower energy consumption than previous SNNs. Specifically, the TFCSNN and STHSNN achieve top-2 performance among the SNN-based models, with energy consumption reductions of 3.48 times and 12.40 times, respectively.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8478-8494"},"PeriodicalIF":11.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Unified Open Adapter for Open-World Noisy Label Learning: Data-Centric and Learning-Based Insights 开放世界噪声标签学习的统一开放适配器：以数据为中心和基于学习的见解

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-13 DOI: 10.1109/TCSVT.2025.3550899

Chen-Chen Zong;Penghui Yang;Ming-Kun Xie;Sheng-Jun Huang

{"title":"A Unified Open Adapter for Open-World Noisy Label Learning: Data-Centric and Learning-Based Insights","authors":"Chen-Chen Zong;Penghui Yang;Ming-Kun Xie;Sheng-Jun Huang","doi":"10.1109/TCSVT.2025.3550899","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550899","url":null,"abstract":"Noisy label learning (NLL) in open-world scenarios poses a novel challenge due to the presence of noisy data from both known and unknown classes. Most existing methods operate under the closed-set assumption, rendering them vulnerable to open-set noise, which significantly degrades their performance. While some approaches attempt to mitigate the impact of open-set examples, they struggle to learn effective discriminative representations for them, leading to unsatisfactory recognition performance. To address these issues, we propose a unified Open Adapter (OpenAda) that identifies open-set noise from both data-centric and learning-based perspectives, and can be easily integrated into mainstream NLL methods to improve their performance and robustness. Specifically, the data-centric part leverages label clusterability to sequentially identify basic clean and basic open-set examples both with high neighbor agreement. The learning-based part integrates one-vs-all classifiers with a progressive open disambiguation strategy to learn a reliable “inlier vs. outlier” boundary for each class. This enables the model to detect challenging open-set examples that partially overlap in the representation space with closed-set ones. Extensive experiments on synthetic and real-world datasets validate the superiority of our approach. Notably, with minor modifications, DivideMix with OpenAda achieves performance improvements of 9.31% and 18.26% on the open-world CIFAR-80 dataset under 80% symmetric noise and 40% asymmetric noise. The code is available at <uri>https://github.com/chenchenzong/OpenAda</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8134-8147"},"PeriodicalIF":11.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PFedLAH: Personalized Federated Learning With Lookahead for Adaptive Cross-Modal Hashing PFedLAH：面向自适应跨模态哈希的前瞻性个性化联邦学习

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-12 DOI: 10.1109/TCSVT.2025.3550794

Yunfei Chen;Hongyu Lin;Zhan Yang;Jun Long

{"title":"PFedLAH: Personalized Federated Learning With Lookahead for Adaptive Cross-Modal Hashing","authors":"Yunfei Chen;Hongyu Lin;Zhan Yang;Jun Long","doi":"10.1109/TCSVT.2025.3550794","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550794","url":null,"abstract":"Cross-modal hashing enables efficient cross-modal retrieval by compressing multi-modal data into compact binary codes, but traditional methods primarily rely on centralized training, which is limited when handling large-scale distributed datasets. Federated learning presents a scalable alternative, yet existing federated frameworks for cross-modal hashing face challenges like data heterogeneity and imbalance, such as non-IID data distribution across clients. To address these challenges, we propose Personalized Federated learning with Lookahead for Adaptive cross-modal Hashing (PFedLAH) method, which combines Feature Adaptive Personalized Learning (FAPL) and Weight-aware Lookahead Adaptive Selection (WLAS) mechanism together. Initially, the FAPL module is designed for the client, enabling personalized learning to mitigate the effect of divergence between server and client resulting from non-IID data distribution, while the local optimization constraint mechanism is also integrated to avoid local optimization shift and ensure better alignment with global convergence. On the server side, WLAS module combines weight-aware adaptive client selection and gradient momentum lookahead to form a dynamic and intelligent client selection scheme, while enhancing the overall convergence and consistency through lookahead gradient prediction. Comprehensive experiments on widely used datasets, including MIRFlickr-25K, MS COCO, and NUS-WIDE, comparing state-of-the-art federated hashing methods, demonstrate the superior retrieval performance, robustness, and scalability of the PFedLAH method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8359-8371"},"PeriodicalIF":11.1,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144782113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STHVC: Spatial-Temporal Hybrid Video Compression for UAV-Assisted IoV Systems 无人机辅助车联网系统的时空混合视频压缩

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-12 DOI: 10.1109/TCSVT.2025.3550726

Lvcheng Chen;Jianing Deng;Xudong Zeng;Liangwei Liu;Yawen Wu;Jingtong Hu;Qi Sun;Zhiguo Shi;Cheng Zhuo

{"title":"STHVC: Spatial-Temporal Hybrid Video Compression for UAV-Assisted IoV Systems","authors":"Lvcheng Chen;Jianing Deng;Xudong Zeng;Liangwei Liu;Yawen Wu;Jingtong Hu;Qi Sun;Zhiguo Shi;Cheng Zhuo","doi":"10.1109/TCSVT.2025.3550726","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550726","url":null,"abstract":"Recent rapid advancements in intelligent vehicular systems and deep learning techniques have led to the emergence of diverse applications utilizing high-quality automotive videos in the Internet-of-Vehicles (IoV), often assisted by uncrewed aerial vehicles (UAVs). These applications aim to provide convenience and security for users. However, transmitting automotive videos with high-quality and low-bit-rate poses a challenge due to the inherent lossiness of traditional compression codecs in current UAV-assisted IoV systems, thereby affecting the performance of subsequent tasks. To address this, we propose a spatial-temporal hybrid video compression framework (STHVC), which integrates Space-Time Super-Resolution (STSR) with conventional codecs to enhance the compression efficiency on automotive videos. In our hybrid design, the encoder generates a low-frame-rate and low-resolution version of the source video, which is then compressed using a traditional codec. During the decoding stage, an effective STSR network is developed to increase both the resolution and the frame rate, and mitigate compression artifacts for automotive videos simultaneously. Additionally, we introduce a rectified intermediate flow estimation technique (RecIFE) within the proposed STSR network to address the challenge of noisy and inaccurate motions during the compression pipeline. Extensive experiments on various benchmark datasets demonstrate that our approach achieves bit-rate reductions of 29.97% compared to H.265 (slow) and 31.27% compared to H.266, while also exhibiting superior restoration performance compared to other state-of-the-art learning-based approaches.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"7882-7895"},"PeriodicalIF":11.1,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144782149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Privacy-Preserving Image Retrieval Based on Thumbnail-Preserving Visual Features 基于缩略图视觉特征的隐私保护图像检索

IF 11.1 1区工程技术

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-12 DOI: 10.1109/TCSVT.2025.3550782

Dezhi An;Xudong Zhang;Dawei Hao;Ruoyu Zhao;Yushu Zhang

{"title":"Privacy-Preserving Image Retrieval Based on Thumbnail-Preserving Visual Features","authors":"Dezhi An;Xudong Zhang;Dawei Hao;Ruoyu Zhao;Yushu Zhang","doi":"10.1109/TCSVT.2025.3550782","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3550782","url":null,"abstract":"Images are generally uploaded to the cloud in plaintext and can be retrieved in the cloud, but privacy may be exposed. To solve this problem, Privacy Preserving Content Based Image Retrieval (PPCBIR) system was proposed. In this system, noise-like image encryption algorithm was used in the early scheme, and Thumbnail Preserving Encryption (TPE) technology was proposed later to balance image privacy and visual usability. However, the existing TPE schemes supporting retrieval have shortcomings in mining the visual usability of TPE images, which limits the retrieval accuracy. Based on this, we propose a VF-PPCBIR scheme combining TPE and image visual features to improve retrieval efficiency and accuracy while ensuring image privacy. Specifically, we redesign a new TPE algorithm for lossless encryption and decryption of arbitrary size images. The design concept of the encryption algorithm is novel, and the encryption effect is more stable. The retrieval process generates thumbnails of the retrieved image and extracts local features in the spatial domain, which are matched with the features extracted from TPE thumbnails in the cloud, and the user can directly select the desired image. In addition, the retrieval scheme uses adjustable feature algorithm to achieve approximate similarity between the ciphertext and the plaintext thumbnail, to achieve accurate feature matching. The experimental results show that the time cost, and mean average precision (mAP) can reach 9.121s and 64.343%, respectively.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"7719-7731"},"PeriodicalIF":11.1,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0