IEEE Transactions on Circuits and Systems for Video Technology最新文献

筛选
英文 中文
Uncertainty-Aware Self-Knowledge Distillation 不确定性感知自我知识的提炼
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-12 DOI: 10.1109/TCSVT.2024.3516145
Yang Yang;Chao Wang;Lei Gong;Min Wu;Zhenghua Chen;Yingxue Gao;Teng Wang;Xuehai Zhou
{"title":"Uncertainty-Aware Self-Knowledge Distillation","authors":"Yang Yang;Chao Wang;Lei Gong;Min Wu;Zhenghua Chen;Yingxue Gao;Teng Wang;Xuehai Zhou","doi":"10.1109/TCSVT.2024.3516145","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3516145","url":null,"abstract":"Self-knowledge distillation has emerged as a powerful method, notably boosting the prediction accuracy of deep neural networks while being resource-efficient, setting it apart from traditional teacher-student knowledge distillation approaches. However, in safety-critical applications, high accuracy alone is not adequate; conveying uncertainty effectively holds equal importance. Regrettably, existing self-knowledge distillation methods have not met the need to improve both prediction accuracy and uncertainty quantification simultaneously. In response to this gap, we present an uncertainty-aware self-knowledge distillation method named UASKD. UASKD introduces an uncertainty-aware contrastive loss and a prediction synthesis technique within the self-knowledge distillation process, aiming to fully harness the potential of self-knowledge distillation for improving both prediction accuracy and uncertainty quantification. Extensive assessments illustrate that UASKD consistently surpasses other self-knowledge distillation techniques and numerous uncertainty calibration methods in both prediction accuracy and uncertainty quantification metrics across various classification and object detection tasks, highlighting its efficacy and adaptability.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4464-4478"},"PeriodicalIF":8.3,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ECINFusion: A Novel Explicit Channel-Wise Interaction Network for Unified Multi-Modal Medical Image Fusion ECINFusion:一种用于统一多模态医学图像融合的新型显式通道交互网络
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-12 DOI: 10.1109/TCSVT.2024.3516705
Xinjian Wei;Yu Qiu;Xiaoxuan Xu;Jing Xu;Jie Mei;Jun Zhang
{"title":"ECINFusion: A Novel Explicit Channel-Wise Interaction Network for Unified Multi-Modal Medical Image Fusion","authors":"Xinjian Wei;Yu Qiu;Xiaoxuan Xu;Jing Xu;Jie Mei;Jun Zhang","doi":"10.1109/TCSVT.2024.3516705","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3516705","url":null,"abstract":"Multi-modal medical image fusion enhance the representation, aggregation and comprehension of functional and structural information, improving accuracy and efficiency for subsequent analysis. However, lacking explicit cross channel modeling and interaction among modalities results in the loss of details and artifacts. To this end, we propose a novel <underline>E</u>xplicit <underline>C</u>hannel-wise <underline>I</u>nteraction <underline>N</u>etwork for unified multi-modal medical image <underline>Fusion</u>, namely ECINFusion. ECINFusion encompasses two components: multi-scale adaptive feature modeling (MAFM) and explicit channel-wise interaction mechanism (ECIM). MAFM leverages adaptive parallel convolution and transformer in multi-scale manner to achieve the global context-aware feature representation. ECIM utilizes the designed multi-head channel-attention mechanism for explicit modeling in channel dimension to accomplish the cross-modal interaction. Besides, we introduce a novel adaptive L-Norm loss, preserving fine-grained details. Experiments demonstrate ECINFusion outperforms state-of-the-art approaches in various medical fusion sub-tasks on different metrics. Furthermore, extended experiments reveal the robust generalization of the proposed in different fusion tasks. In breif, the proposed explicit channel-wise interaction mechanism provides new insight for multi-modal interaction.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4011-4025"},"PeriodicalIF":8.3,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration 多专家自适应选择:一体化图像恢复的任务平衡
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-12 DOI: 10.1109/TCSVT.2024.3516074
Xiaoyan Yu;Shen Zhou;Huafeng Li;Liehuang Zhu
{"title":"Multi-Expert Adaptive Selection: Task-Balancing for All-in-One Image Restoration","authors":"Xiaoyan Yu;Shen Zhou;Huafeng Li;Liehuang Zhu","doi":"10.1109/TCSVT.2024.3516074","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3516074","url":null,"abstract":"The use of a single image restoration framework to achieve multi-task image restoration has garnered significant attention from researchers. However, several practical challenges remain, including meeting the specific and simultaneous demands of different tasks, balancing relationships between tasks, and effectively utilizing task correlations in model design. To address these challenges, this paper explores a multi-expert adaptive selection mechanism. We begin by designing a feature representation method that accounts for both the pixel channel level and the global level, encompassing low-frequency and high-frequency components of the image. Based on this method, we construct a multi-expert selection and ensemble scheme. This scheme adaptively selects the most suitable expert from the expert library according to the content of the input image and the prompts of the current task. It not only meets the individualized needs of different tasks but also achieves balance and optimization across tasks. By sharing experts, our design promotes interconnections between different tasks, thereby enhancing overall performance and resource utilization. Additionally, the multi-expert mechanism effectively eliminates irrelevant experts, reducing interference from them and further improving the effectiveness and accuracy of image restoration. Experimental results demonstrate that our proposed method is both effective and superior to existing approaches, highlighting its potential for practical applications in multi-task image restoration. The source code of the proposed method is available at <uri>https://github.com/zhoushen1/MEASNet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4619-4634"},"PeriodicalIF":8.3,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Long Video Understanding via Fine-Detailed Video Story Generation 通过精细的视频故事生成实现长视频理解
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3514820
Zeng You;Zhiquan Wen;Yaofo Chen;Xin Li;Runhao Zeng;Yaowei Wang;Mingkui Tan
{"title":"Toward Long Video Understanding via Fine-Detailed Video Story Generation","authors":"Zeng You;Zhiquan Wen;Yaofo Chen;Xin Li;Runhao Zeng;Yaowei Wang;Mingkui Tan","doi":"10.1109/TCSVT.2024.3514820","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3514820","url":null,"abstract":"Long video understanding has become a critical task in computer vision, driving advancements across numerous applications from surveillance to content retrieval. Existing video understanding methods suffer from two challenges when dealing with long video understanding: intricate long-context relationship modeling and interference from redundancy. To tackle these challenges, we introduce Fine-Detailed Video Story generation (FDVS), which interprets long videos into detailed textual representations. Specifically, to achieve fine-grained modeling of long-temporal content, we propose a Bottom-up Video Interpretation Mechanism that progressively interprets video content from clips to video. To avoid interference from redundant information in videos, we introduce a Semantic Redundancy Reduction mechanism that removes redundancy at both the visual and textual levels. Our method transforms long videos into hierarchical textual representations that contain multi-granularity information of the video. With these representations, FDVS is applicable to various tasks without any fine-tuning. We evaluate the proposed method across eight datasets spanning three tasks. The performance demonstrates the effectiveness and versatility of our method.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4592-4607"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contour Knowledge-Aware Perception Learning for Semantic Segmentation 面向语义分割的轮廓知识感知学习
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3515088
Chao You;Licheng Jiao;Lingling Li;Xu Liu;Fang Liu;Wenping Ma;Shuyuan Yang
{"title":"Contour Knowledge-Aware Perception Learning for Semantic Segmentation","authors":"Chao You;Licheng Jiao;Lingling Li;Xu Liu;Fang Liu;Wenping Ma;Shuyuan Yang","doi":"10.1109/TCSVT.2024.3515088","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3515088","url":null,"abstract":"The diversity of contextual information is of great importance for accurate semantic segmentation. However, most methods focus on single spatial contextual information, which results in an overlap of the semantic content of categories and a loss of contour information of objects. In this article, we propose a novel contour knowledge-aware perception learning network (CKPL-Net) to capture diverse contextual information by space-category aggregation module (SCAM) and contour-aware calibration module (CACM). First, SCAM is introduced to enhance intraclass consistency and interclass differentiation of features. By integrating space-aware and category-aware attention, SCAM reduces the redundancy of features from a categorical perspective while maintaining spatial correlation of pixels, substantially avoiding the overlap of the semantic content in categories. Second, CACM is designed to maintain the integrity of objects by perceiving contour contextual information. It develops a novel contour-aware knowledge and adaptively transforms the grid structure of convolutions for boundary pixels, which effectively calibrates the representation of features near boundaries. Finally, the quantitative and qualitative analyses on the three public datasets: ISPRS Potsdam dataset, ISPRS Vaihingen dataset, and WHDLD dataset, demonstrate that the proposed CKPL-Net achieves superior performance compared with prevalent methods, which indicates diverse contextual information is beneficial for accurate segmentation.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4560-4575"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable and Robust Tensor Ring Decomposition for Large-Scale Data With Missing Data and Outliers 具有缺失数据和异常值的大规模数据的可扩展鲁棒张量环分解
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3514614
Yicong He;George K. Atia
{"title":"Scalable and Robust Tensor Ring Decomposition for Large-Scale Data With Missing Data and Outliers","authors":"Yicong He;George K. Atia","doi":"10.1109/TCSVT.2024.3514614","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3514614","url":null,"abstract":"Tensor ring (TR) decomposition demonstrates superior performance in handling high-order tensors. However, traditional TR-based decomposition algorithms face limitations in real-world applications due to large data sizes, missing entries, and outlier corruption. To address these challenges, we propose a scalable and robust TR decomposition algorithm for large-scale tensor data that effectively handles missing entries and gross corruptions. Our method introduces a novel auto-weighted scaled steepest descent approach that adaptively identifies outliers and completes missing entries during decomposition. Additionally, leveraging the tensor ring decomposition model, we develop a Fast Gram Matrix Computation (FGMC) technique and a Randomized Subtensor Sketching (RStS) strategy, significantly reducing storage and computational complexity. Experimental results demonstrate that the proposed method outperforms existing TR decomposition and tensor completion methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4493-4505"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Framework for Learning Bézier Decomposition From 3D Point Clouds 一种基于三维点云的bsamzier分解学习框架
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3514740
Rao Fu;Qian Li;Cheng Wen;Ning An;Fulin Tang
{"title":"A Novel Framework for Learning Bézier Decomposition From 3D Point Clouds","authors":"Rao Fu;Qian Li;Cheng Wen;Ning An;Fulin Tang","doi":"10.1109/TCSVT.2024.3514740","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3514740","url":null,"abstract":"This paper proposes a fully differentiable and end-to-end framework for learning Bézier decomposition on 3D point clouds. The framework aims to partition input point clouds into multiple Bézier primitive patches through a learned Bézier decomposition process. Unlike previous approaches that handle different primitive types separately, thus being limited to specific shape categories, our method seeks to achieve a generalized primitive segmentation on point clouds. Drawing inspiration from Bézier decomposition on NURBS models, we adapt it to guide point cloud segmentation without relying on pre-defined primitive types. To achieve this, we introduce a joint optimization framework that simultaneously learns Bézier primitive segmentation and geometric fitting in a cascaded architecture. Additionally, we propose a soft voting regularizer to enhance primitive segmentation and an auto-weight embedding module to effectively cluster point features, making the network more robust and applicable to various scenarios. Furthermore, we incorporate a reconstruction module capable of processing multiple CAD models with different primitives simultaneously. Extensive experiments were conducted on both synthetic ABC datasets and real-scan datasets to validate and compare our approach against several baseline methods. The results demonstrate that our method outperforms previous work in terms of segmentation accuracy, while also exhibiting significantly faster inference speed.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4329-4340"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-COD+: SAM-Guided Unified Framework for Weakly-Supervised Camouflaged Object Detection SAM-COD+: sam制导的弱监督伪装目标检测统一框架
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3514713
Huafeng Chen;Pengxu Wei;Guangqian Guo;Shan Gao
{"title":"SAM-COD+: SAM-Guided Unified Framework for Weakly-Supervised Camouflaged Object Detection","authors":"Huafeng Chen;Pengxu Wei;Guangqian Guo;Shan Gao","doi":"10.1109/TCSVT.2024.3514713","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3514713","url":null,"abstract":"Most Camouflaged Object Detection (COD) methods heavily rely on mask annotations, which are time-consuming and labor-intensive to acquire. Existing weakly-supervised COD approaches exhibit significantly inferior performance compared to fully-supervised methods and struggle to simultaneously support all the existing types of camouflaged object labels, including scribbles, bounding boxes, and points. Even for Segment Anything Model (SAM), it is still problematic to handle the weakly-supervised COD and it typically encounters challenges of prompt compatibility of the scribble labels, extreme response, semantically erroneous response, and unstable feature representations, producing unsatisfactory results in camouflaged scenes. To mitigate these issues, we propose a unified COD framework in this paper, termed SAM-COD, which is capable of supporting arbitrary weakly-supervised labels. Our SAM-COD employs a prompt adapter to handle scribbles as prompts based on SAM. Meanwhile, we introduce response filter and semantic matcher modules to improve the quality of the masks obtained by SAM under COD prompts. To alleviate the negative impacts of inaccurate mask predictions, a new strategy of prompt-adaptive knowledge distillation is utilized to ensure a reliable feature representation. To validate the effectiveness of our approach, we have conducted extensive empirical experiments on three mainstream COD benchmarks. The results demonstrate the superiority of our method against state-of-the-art weakly-supervised and even fully-supervised methods. Our source codes and trained models will be publicly released.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4635-4647"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Salient Object Detection on Light Field With High-Quality Synthetic Labels 高质量合成标签光场无监督显著目标检测
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3514754
Yanfeng Zheng;Zhong Luo;Ying Cao;Xiaosong Yang;Weiwei Xu;Zheng Lin;Nan Yin;Pengjie Wang
{"title":"Unsupervised Salient Object Detection on Light Field With High-Quality Synthetic Labels","authors":"Yanfeng Zheng;Zhong Luo;Ying Cao;Xiaosong Yang;Weiwei Xu;Zheng Lin;Nan Yin;Pengjie Wang","doi":"10.1109/TCSVT.2024.3514754","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3514754","url":null,"abstract":"Most current Light Field Salient Object Detection (LFSOD) methods require full supervision with labor-intensive pixel-level annotations. Unsupervised Light Field Salient Object Detection (ULFSOD) has gained attention due to this limitation. However, existing methods use traditional handcrafted techniques to generate noisy pseudo-labels, which degrades the performance of models trained on them. To mitigate this issue, we present a novel learning-based approach to synthesize labels for ULFSOD. We introduce a prominent focal stack identification module that utilizes light field information (focal stack, depth map, and RGB color image) to generate high-quality pixel-level pseudo-labels, aiding network training. Additionally, we propose a novel model architecture for LFSOD, combining a multi-scale spatial attention module for focal stack information with a cross fusion module for RGB and focal stack integration. Through extensive experiments, we demonstrate that our pseudo-label generation method significantly outperforms existing methods in label quality. Our proposed model, trained with our labels, shows significant improvement on ULFSOD, achieving new state-of-the-art scores across public benchmarks.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4608-4618"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Group Generalized RGB and RGB-D Co-Salient Object Detection 单组广义RGB和RGB- d共显著目标检测
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-11 DOI: 10.1109/TCSVT.2024.3514872
Jie Wang;Nana Yu;Zihao Zhang;Yahong Han
{"title":"Single-Group Generalized RGB and RGB-D Co-Salient Object Detection","authors":"Jie Wang;Nana Yu;Zihao Zhang;Yahong Han","doi":"10.1109/TCSVT.2024.3514872","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3514872","url":null,"abstract":"Co-salient object detection (CoSOD) aims to segment the co-occurring salient objects in a given group of relevant images. Existing methods typically rely on extensive group training data to enhance the model’s CoSOD capabilities. However, fitting prior knowledge of the extensive group results in a significant performance gap between the seen and out-of-sample image groups. Relaxing such a fitting with fewer prior groups may improve the generalization ability of CoSOD while alleviating the annotation burdens. Hence, it is essential to explore the use of fewer groups during the training phase, such as using only single group, to pursue a highly generalized CoSOD model. We term this new setting as Sg-CoSOD, which aims to train a model using only a single group and effectively apply it to any unseen RGB and RGB-D CoSOD test groups. Towards Sg-CoSOD, it is important to ensure detection performance with limited data and release class dependency with only a single-group. Thus, we present a method, i.e., cross-excitation between saliency and ‘Co’, which decouples the CoSOD task into two parallel branches: ‘Co’ To Saliency (CTS) and Saliency To ‘Co’ (STC). The CTS branch focuses on mining group consensus to guide image co-saliency predictions, while the STC branch is dedicated to using saliency priors to motivate group consensus mining. Furthermore, we propose a Class-Agnostic Triplet (CAT) loss to constrain intra-group consensus while suppressing the model from acquiring class prior knowledge. Extensive experiments on RGB and RGB-D CoSOD tasks with multiple unknown groups show that our model has higher generalization capabilities (e.g., for large-scale datasets CoSOD3k and CoSal1k with multiple generalized groups, we obtain a gain of over 15% in <inline-formula> <tex-math>$F_{m}$ </tex-math></inline-formula>). Further experimental analyses also reveal that the proposed Sg-CoSOD paradigm has significant potential and promising prospects.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4521-4534"},"PeriodicalIF":8.3,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143913606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信