IEEE Transactions on Circuits and Systems for Video Technology最新文献

筛选
英文 中文
Overview of Variable Rate Coding in JPEG AI JPEG AI中可变速率编码概述
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-19 DOI: 10.1109/TCSVT.2025.3552971
Panqi Jia;Fabian Brand;Dequan Yu;Alexander Karabutov;Elena Alshina;André Kaup
{"title":"Overview of Variable Rate Coding in JPEG AI","authors":"Panqi Jia;Fabian Brand;Dequan Yu;Alexander Karabutov;Elena Alshina;André Kaup","doi":"10.1109/TCSVT.2025.3552971","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3552971","url":null,"abstract":"Empirical evidence has demonstrated that learning-based image compression can outperform classical compression frameworks. This has led to the ongoing standardization of learned-based image codecs, namely Joint Photographic Experts Group (JPEG) AI. The objective of JPEG AI is to enhance compression efficiency and provide a software and hardware-friendly solution. Based on our research, JPEG AI represents the first standardization that can facilitate the implementation of a learned image codec on a mobile device. This article presents an overview of the variable rate coding functionality in JPEG AI, which includes three variable rate adaptations: a three-dimensional quality map, a fast bit rate matching algorithm, and a training strategy. The variable rate adaptations offer a continuous rate function up to 2.0 bpp, exhibiting a high level of performance, a flexible bit allocation between different color components, and a region of interest function for the specified use case. The evaluation of performance encompasses both objective and subjective results. With regard to the objective bit rate matching, the main profile with low complexity yielded a 13.1% BD-rate gain over VVC intra, while the high profile with high complexity achieved a 19.2% BD-rate gain over VVC intra. The BD-rate result is calculated as the mean of the seven perceptual metrics defined in the JPEG AI common test conditions. With respect to subjective results, the example of improving the quality of the region of interest is illustrated.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9460-9474"},"PeriodicalIF":11.1,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Modal Architecture With Spatio-Temporal-Text Adaptation for Video-Based Traffic Accident Anticipation 基于视频的交通事故预测的时空文本自适应多模态体系结构
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-19 DOI: 10.1109/TCSVT.2025.3552895
Patrik Patera;Yie-Tarng Chen;Wen-Hsien Fang
{"title":"A Multi-Modal Architecture With Spatio-Temporal-Text Adaptation for Video-Based Traffic Accident Anticipation","authors":"Patrik Patera;Yie-Tarng Chen;Wen-Hsien Fang","doi":"10.1109/TCSVT.2025.3552895","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3552895","url":null,"abstract":"Early and precise accident anticipation is critical for preventing road traffic incidents in advanced traffic systems. This paper presents a Multi-modal Architecture with Spatio-Temporal-Text Adaptation (MASTTA), featuring a Visual Encoder and a Text Encoder within a streamlined end-to-end framework for traffic accident anticipation. Both encoders leverage the CLIP model, pre-trained on large-scale text-image pairs, to utilize visual and textual information effectively. MASTTA captures complex traffic patterns and relationships by fine-tuning only the adapters, reducing retraining demands. In the Visual Encoder, spatio-temporal adaptation is achieved through a novel Temporal Adapter, a novel Spatial Adapter, and an MLP Adapter. The Temporal Adapter enhances temporal consistency in accident-prone areas, while the Spatial Adapter captures spatio-temporal interactions among visual cues. The Text Encoder, equipped with a Text Adapter and an MLP Adapter, aligns latent textual and visual features in a joint embedding space, refining semantic representation. This synergy of text and visual adapters enables MASTTA to model complex spatial interactions across long-range temporal context, improving accident anticipation. We validate MASTTA on DAD and CCD datasets, demonstrating significant improvements in both the earliness and correctness compared to state-of-the-art methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8989-9002"},"PeriodicalIF":11.1,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction Enhancement for Point Cloud Attribute Compression Using Smoothing Filter 基于平滑滤波的点云属性压缩预测增强
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-19 DOI: 10.1109/TCSVT.2025.3571114
Qian Yin;Ruoke Yan;Xinfeng Zhang;Siwei Ma
{"title":"Prediction Enhancement for Point Cloud Attribute Compression Using Smoothing Filter","authors":"Qian Yin;Ruoke Yan;Xinfeng Zhang;Siwei Ma","doi":"10.1109/TCSVT.2025.3571114","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3571114","url":null,"abstract":"In recent years, 3D point cloud compression (PCC) has emerged as a prominent research area, attracting widespread attention from both academia and industry. As one of the PCC standards released by the moving picture expert group (MPEG), the geometry-based PCC (G-PCC) adopts two attribute lossy coding schemes, namely the prediction-based Lifting Transform and the region adaptive hierarchical transform (RAHT). Based on statistical analysis, it can be observed that the increase in predictive distance gradually weakens the attribute correlation between points, resulting in larger prediction errors. To address this issue, we propose a prediction enhancement method by using the smoothing filter to improve the attribute coding efficiency, which is both integrated into the Lifting Transform and RAHT. For the former, the neighbor point smoothing method based on the prediction order is proposed via a weighted average strategy. The proposed smoothing is only applied to points in the lower level of details (LoDs) by adjusting the distance-based predicted attribute values. For the latter, we design a neighbor node smoothing method after the inter depth up-sampling (IDUS) prediction, where the sub-nodes in the same unit node are filtered for lower levels. Experimental results have demonstrated that compared with two latest MPEG G-PCC reference software TMC13-v23.0 and GeSTM-v3.0, our proposed enhanced prediction method exhibits superior Bjøntegaard delta bit rate (BDBR) gains with small increase in time complexity.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 10","pages":"10544-10556"},"PeriodicalIF":11.1,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C2RL: Content and Context Representation Learning for Gloss-Free Sign Language Translation and Retrieval 面向无光泽手语翻译与检索的内容与上下文表示学习
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-19 DOI: 10.1109/TCSVT.2025.3553052
Zhigang Chen;Benjia Zhou;Yiqing Huang;Jun Wan;Yibo Hu;Hailin Shi;Yanyan Liang;Zhen Lei;Du Zhang
{"title":"C2RL: Content and Context Representation Learning for Gloss-Free Sign Language Translation and Retrieval","authors":"Zhigang Chen;Benjia Zhou;Yiqing Huang;Jun Wan;Yibo Hu;Hailin Shi;Yanyan Liang;Zhen Lei;Du Zhang","doi":"10.1109/TCSVT.2025.3553052","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3553052","url":null,"abstract":"Sign Language Representation Learning (SLRL) is crucial for a range of sign language-related downstream tasks such as Sign Language Translation (SLT) and Sign Language Retrieval (SLRet). Recently, many gloss-based and gloss-free SLRL methods have been proposed, showing promising performance. Among them, the gloss-free approach shows promise for strong scalability without relying on gloss annotations. However, it currently faces suboptimal solutions due to challenges in encoding the intricate, context-sensitive characteristics of sign language videos, mainly struggling to discern essential sign features using a non-monotonic video-text alignment strategy. Therefore, we introduce an innovative pretraining paradigm for gloss-free SLRL, called C<sup>2</sup>RL, in this paper. Specifically, rather than merely incorporating a non-monotonic semantic alignment of video and text to learn language-oriented sign features, we emphasize two pivotal aspects of SLRL: Implicit Content Learning (ICL) and Explicit Context Learning (ECL). ICL delves into the content of communication, capturing the nuances, emphasis, timing, and rhythm of the signs. In contrast, ECL focuses on understanding the contextual meaning of signs and converting them into equivalent sentences. Despite its simplicity, extensive experiments confirm that the joint optimization of ICL and ECL results in robust sign language representation and significant performance gains in gloss-free SLT and SLRet tasks. Notably, C<sup>2</sup>RL improves the BLEU-4 score by +5.3 on P14T, +10.6 on CSL-daily, +6.2 on OpenASL, and +1.3 on How2Sign. It also boosts the R@1 score by +8.3 on P14T, +14.4 on CSL-daily, and +5.9 on How2Sign. Additionally, we set a new baseline for the OpenASL dataset in the SLRet task.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8533-8544"},"PeriodicalIF":11.1,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10933970","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Local and Global Feature Fusion for Blind Quality Assessment of Enhanced Images 基于多尺度局部和全局特征融合的增强图像盲质量评估
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-18 DOI: 10.1109/TCSVT.2025.3552086
Jingchao Cao;Shuai Zhang;Yutao Liu;Feng Gao;Ke Gu;Guangtao Zhai;Junyu Dong;Sam Kwong
{"title":"Multi-Scale Local and Global Feature Fusion for Blind Quality Assessment of Enhanced Images","authors":"Jingchao Cao;Shuai Zhang;Yutao Liu;Feng Gao;Ke Gu;Guangtao Zhai;Junyu Dong;Sam Kwong","doi":"10.1109/TCSVT.2025.3552086","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3552086","url":null,"abstract":"Image enhancement plays a crucial role in computer vision by improving visual quality while minimizing distortion. Traditional methods enhance images through pixel value transformations, yet they often introduce new distortions. Recent advancements in deep learning-based techniques promise better results but challenge the preservation of image fidelity. Therefore, it is essential to evaluate the visual quality of enhanced images. However, existing quality assessment methods frequently encounter difficulties due to the unique distortions introduced by these enhancements, thereby restricting their effectiveness. To address these challenges, this paper proposes a novel blind image quality assessment (BIQA) method for enhanced natural images, termed multi-scale local feature fusion and global feature representation-based quality assessment (MLGQA). This model integrates three key components: a multi-scale Feature Attention Mechanism (FAM) for local feature extraction, a Local Feature Fusion (LFF) module for cross-scale feature synthesis, and a Global Feature Representation (GFR) module using Vision Transformers to capture global perceptual attributes. This synergistic framework effectively captures both fine-grained local distortions and broader global features that collectively define the visual quality of enhanced images. Furthermore, in the absence of a dedicated benchmark for enhanced natural images, we design the Natural Image Enhancement Database (NIED), a large-scale dataset consisting of 8,581 original images and 102,972 enhanced natural images generated through a wide array of traditional and deep learning-based enhancement techniques. Extensive experiments on NIED demonstrate that the proposed MLGQA model significantly outperforms current state-of-the-art BIQA methods in terms of both prediction accuracy and robustness.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8917-8928"},"PeriodicalIF":11.1,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative Probabilistic Entropy Modeling With Conditional Diffusion for Learned Image Compression 基于条件扩散的生成概率熵模型在学习图像压缩中的应用
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-18 DOI: 10.1109/TCSVT.2025.3551780
Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong
{"title":"Generative Probabilistic Entropy Modeling With Conditional Diffusion for Learned Image Compression","authors":"Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong","doi":"10.1109/TCSVT.2025.3551780","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551780","url":null,"abstract":"Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9443-9459"},"PeriodicalIF":11.1,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Realistic Hierarchical Object Detection: Problem, Benchmark, and Solution 面向现实的分层对象检测:问题、基准和解决方案
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-18 DOI: 10.1109/TCSVT.2025.3552596
Juexiao Feng;Yuhong Yang;Mengyao Lyu;Tianxiang Hao;Yi-Jie Huang;Yanchun Xie;Yaqian Li;Jungong Han;Liuyu Xiang;Guiguang Ding
{"title":"Toward Realistic Hierarchical Object Detection: Problem, Benchmark, and Solution","authors":"Juexiao Feng;Yuhong Yang;Mengyao Lyu;Tianxiang Hao;Yi-Jie Huang;Yanchun Xie;Yaqian Li;Jungong Han;Liuyu Xiang;Guiguang Ding","doi":"10.1109/TCSVT.2025.3552596","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3552596","url":null,"abstract":"With the continuous advancement of deep learning, object detection has made remarkable progress in accurately identifying a wide range of object categories, even within increasingly complex scenes. However, as the number of categories grows, visual concepts naturally organize into a label hierarchy. We contend that existing hierarchical classification and detection methods predominantly prioritize fine-grained prediction, potentially leading to inconsistencies with realistic human perception. From this perspective, we investigate the Hierarchical Object Detection (HOD) problem to better align with real-world perception. To address the lack of benchmarks in the field, we build a large-scale HOD benchmark termed RHOD with open-source datasets, comprising 740 categories. To better align the hierarchical object detectors towards realistic perception, we propose a new evaluation metric named Hierarchical Average Precision (HAP). Furthermore, we present a novel hierarchical object detection method that includes two components, Tree Soft Labeling (TSL) and Hierarchical Extension and Suppression (HES). Our method mitigates the issue of overconfidence in fine-grained predictions, which has been prevalent in previous approaches. We evaluate a range of existing methods on the RHOD benchmark, including plain, hierarchical, and open-vocabulary models. Additionally, we perform comprehensive experiments to assess the performance of our proposed method. The experimental results show that our method achieves state-of-the-art performance on the RHOD benchmark.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9351-9364"},"PeriodicalIF":11.1,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Dense Object Detector With Scale Balanced Sample Assignment and Refinement 一种新型的尺度平衡样本分配与细化的密集目标检测器
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-18 DOI: 10.1109/TCSVT.2025.3551912
Jinpeng Dong;Dingyi Yao;Yufeng Hu;Sanping Zhou;Nanning Zheng
{"title":"A Novel Dense Object Detector With Scale Balanced Sample Assignment and Refinement","authors":"Jinpeng Dong;Dingyi Yao;Yufeng Hu;Sanping Zhou;Nanning Zheng","doi":"10.1109/TCSVT.2025.3551912","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551912","url":null,"abstract":"Scale variation of objects remains one of the crucial challenges in object detection. Currently, conventional dense detectors with fixed receptive fields and label weights are not conducive to the detection of multi-scale objects. However, the design limitations of unbalanced label weights and fixed refinement for multi-scale objects and multi-tasks in these studies make it difficult to achieve better detection performance. In this paper, we propose a novel dense detector named Balanced FCOS which consists of two components: Balanced Label Assignment (BLA) and Flexible Shape-based Refinement (FSR). The BLA implements scale-balanced sample assignment by introducing reweighting factors consisting of localization and classification scores into the label assignment. Low-quality but high-weight samples can be weakened by the BLA. Furthermore, we design a cross-reweighting mechanism in the BLA to ensure score consistency between classification and localization. The FSR implements scale-balanced sample refinement by learning flexible sample points’ offsets for multi-scale objects and multi-tasks based on objects’ coarse features to get more discriminative features with appropriate receptive field. In addition, better features obtained by FSR are beneficial to get better classification and localization scores, which can be used by BLA to produce accurate label weights. Only equipped with the BLA, we can achieve 41.7/46.6 AP under R50/R101-FCOS without any additional parameters. When combining the BLA with the FSR, our Balanced FCOS achieves SOTA results among dense detectors on the COCO test-dev set. Experiments conducted on other heads (T-Head, DyHead), detectors (DINO), and datasets (AI-TOD) further demonstrate the effectiveness of our method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9337-9350"},"PeriodicalIF":11.1,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Feature Training for Few-Shot Object Detection 针对少射目标检测的对抗特征训练
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-17 DOI: 10.1109/TCSVT.2025.3552138
Tianxu Wu;Zhimeng Xin;Shiming Chen;Yixiong Zou;Xinge You
{"title":"Adversarial Feature Training for Few-Shot Object Detection","authors":"Tianxu Wu;Zhimeng Xin;Shiming Chen;Yixiong Zou;Xinge You","doi":"10.1109/TCSVT.2025.3552138","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3552138","url":null,"abstract":"Currently, most few-shot object detection (FSOD) methods apply the two-stage training strategy, which first requires training in abundant base classes and transfers the learned prior knowledge to the novel stage. However, due to the inherent imbalance between the base and novel classes, the trained model tends to have a bias toward recognizing novel classes as base ones when they are similar. To address this problem, we propose an adversarial feature training (AFT) strategy aimed at effectively calibrating the decision boundary between novel and base classes to alleviate classification confusion in FSOD. Specifically, we introduce the Classification Level Fast Gradient Sign Method (CL-FGSM), which leverages gradient information from the classifier module to generate adversarial samples with extra feature attention. By attacking the high-level features, we can create adversarial feature samples that are combined with clean high-level features in a suitable range of proportions. Such adversarial feature samples, generated by CL-FGSM, are then combined with clean high-level features in a suitable range of proportions to train the few-shot detector. By this, the novel model is forced to learn extra class-specific features that improve the robustness of the classifier to establish a correct decision boundary, which avoids confusion between base and novel classes in FSOD. Extensive experiments demonstrate that our proposed AFT strategy effectively calibrates the classification decision boundary to avoid classification confusion between base and novel classes and significantly improves the performance of FSOD. Our code is available at <uri>https://github.com/wutianxu/AFT</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9324-9336"},"PeriodicalIF":11.1,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty Neural Surfaces for Space Target 3D Reconstruction Under Constrained Views 约束视图下空间目标三维重建的不确定性神经曲面
IF 11.1 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-17 DOI: 10.1109/TCSVT.2025.3551779
Yuandong Li;Qinglei Hu;Fei Dong;Dongyu Li;Zhenchao Ouyang
{"title":"Uncertainty Neural Surfaces for Space Target 3D Reconstruction Under Constrained Views","authors":"Yuandong Li;Qinglei Hu;Fei Dong;Dongyu Li;Zhenchao Ouyang","doi":"10.1109/TCSVT.2025.3551779","DOIUrl":"https://doi.org/10.1109/TCSVT.2025.3551779","url":null,"abstract":"In asteroid exploration and orbital servicing missions with space robots, accurate 3D structural of the target is typically relied upon for planning landing trajectories and controlling movements. Unlike conventional neural radiance fields (NeRF) studies, which rely on full-view random sampling of targets that can be easily achieved on the ground, spacecraft operations present unique challenges due to the kinematic orbit constraint, the high cost of controlled motion, and limited fuel reserves. This results in limited observation of space targets. In order to obtain 3D structure under close-flybys and restricted observation, we proposed Uncertainty Neural Surfaces (UNS) model based on Bayesian uncertainty estimation. UNS enhance the precision of reconstructed target surfaces under constrained-views, providing guidance for subsequent imaging view design. Specifically, UNS introduces Bayesian estimation based surface uncertainty on neural implicit surfaces. The estimation is calculated based on the degree of self-occlusion of the target and the difference between rendered and actual colors. This approach enables uncertain estimation of 3D space and arbitrary view. Finally, extensive systematic evaluations and analyses of spacecraft model sampling in a local darkroom validate the sophistication of UNS in uncertainty estimation and surface reconstruction quality. Code is available at <uri>https://github.com/YD-96/UNS</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8045-8056"},"PeriodicalIF":11.1,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信