IEEE Transactions on Circuits and Systems for Video Technology最新文献

筛选
英文 中文
DFF-VIO: A General Dynamic Feature Fused Monocular Visual-Inertial Odometry
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-17 DOI: 10.1109/TCSVT.2024.3482573
Nan Luo;Zhexuan Hu;Yuan Ding;Jiaxu Li;Hui Zhao;Gang Liu;Quan Wang
{"title":"DFF-VIO: A General Dynamic Feature Fused Monocular Visual-Inertial Odometry","authors":"Nan Luo;Zhexuan Hu;Yuan Ding;Jiaxu Li;Hui Zhao;Gang Liu;Quan Wang","doi":"10.1109/TCSVT.2024.3482573","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3482573","url":null,"abstract":"Integrating dynamic effects has shown its significance in enhancing the accuracy and robustness of Visual-Inertial Odometry (VIO) systems in dynamic scenarios. Existing methods either prune dynamic features or rely heavily on prior semantic knowledge or kinetic models, proved unfriendly to scenes with a multitude of dynamic elements. This work proposes a novel dynamic feature fusion method for monocular VIO, named DFF-VIO, which requires no prior models or scene preference. By combining IMU-predicted poses with visual clues, it initially identifies dynamic features during the tracking stage by constraints of consistency and degree of motion. Then, we innovatively design a Dynamic Transformation Operation (DTO) to separate the effect of dynamic features on multiple frames into pairwise effects and construct a Dynamic Feature Cell (DFC) to preserve the eligible information. Subsequently, we reformulate the VIO nonlinear optimization problem and construct dynamic feature residuals with the transformed DFC as a unit. Based on the proposed inter-frame model of moving features, a so-called motion compensation is developed to resolve the reprojection issue of dynamic features, allowing their effects to be incorporated into the VIO’s tight coupling optimization, thereby realizing robust positioning in dynamic scenarios. We conduct accuracy evaluations on ADVIO and VIODE, degradation tests on EuRoC dataset, as well as ablation studies to highlight the joint optimization of dynamic residuals. Results reveal that DFF-VIO outperforms state-of-the-art methods in pose accuracy and robustness across various dynamic environments.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1758-1773"},"PeriodicalIF":8.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neuromorphic Imaging With Super-Resolution
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-17 DOI: 10.1109/TCSVT.2024.3482436
Pei Zhang;Shuo Zhu;Chutian Wang;Yaping Zhao;Edmund Y. Lam
{"title":"Neuromorphic Imaging With Super-Resolution","authors":"Pei Zhang;Shuo Zhu;Chutian Wang;Yaping Zhao;Edmund Y. Lam","doi":"10.1109/TCSVT.2024.3482436","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3482436","url":null,"abstract":"Neuromorphic imaging is an emerging technique that imitates the human retina to sense variations in dynamic scenes. It responds to pixel-level brightness changes by asynchronous streaming events and boasts microsecond temporal precision over a high dynamic range, yielding blur-free recordings under extreme illumination. Nevertheless, this modality falls short in spatial resolution and leads to a low level of visual richness and clarity. Pursuing hardware upgrades is expensive and might cause compromised performance due to more burdens on computational requirements. Another option is to harness offline, plug-in-play super-resolution solutions. However, existing ones, which demand substantial sample volumes for lengthy training on massive computing resources, are largely restricted by real data availability owing to the current imperfect high-resolution devices, as well as the randomness and variability of motion. To tackle these challenges, we introduce the first self-supervised neuromorphic super-resolution prototype. It can be self-adaptive to per input source from any low-resolution camera to estimate an optimal, high-resolution counterpart of any scale, without the need of side knowledge and prior training. Evaluated on downstream tasks, such a simple yet effective method can obtain competitive results against the state-of-the-arts, significantly promoting flexibility but not sacrificing accuracy. It also delivers enhancements for inferior natural images and optical micrographs acquired under non-ideal imaging conditions, breaking through the limitations that are challenging to overcome with frame-based techniques. In the current landscape where the use of high-resolution cameras for event-based sensing remains an open debate, our solution is a cost-efficient and practical alternative, paving the way for more intelligent imaging systems.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1715-1727"},"PeriodicalIF":8.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reversible Data Hiding-Based Local Contrast Enhancement With Nonuniform Superpixel Blocks for Medical Images
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-17 DOI: 10.1109/TCSVT.2024.3482556
Guangyong Gao;Sitian Yang;Xiangyang Hu;Zhihua Xia;Yun-Qing Shi
{"title":"Reversible Data Hiding-Based Local Contrast Enhancement With Nonuniform Superpixel Blocks for Medical Images","authors":"Guangyong Gao;Sitian Yang;Xiangyang Hu;Zhihua Xia;Yun-Qing Shi","doi":"10.1109/TCSVT.2024.3482556","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3482556","url":null,"abstract":"Reversible data hiding-based contrast enhancement can be applied to medical images, which not only allows the storage of patient information through reversible embedding, but also achieves image contrast enhancement, thereby assisting doctors in accurately diagnosing patient diseases. In response to the existing problems of mainstream methods, a novel reversible data hiding-based local contrast enhancement method for medical images is proposed. This method utilizes superpixel segmentation to segment medical images into multiple pixel blocks, and performs reversible data embedding and contrast enhancement for the pixel blocks within the region of interest (ROI). Additionally, a new embedding strategy is proposed. According to the contrast and texture features of each pixel block, histogram expansion of different degrees is carried out to effectively enhance the pixel blocks with low contrast, while avoiding excessive enhancement of the pixel blocks with high contrast. Experimental results demonstrate that, compared with the state-of-the-art mainstream methods, the proposed method not only improves the contrast in the ROI but also ensures high visual quality of the medical images.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1745-1757"},"PeriodicalIF":8.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZeroPose: CAD-Prompted Zero-Shot Object 6D Pose Estimation in Cluttered Scenes
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-17 DOI: 10.1109/TCSVT.2024.3482439
Jianqiu Chen;Zikun Zhou;Mingshan Sun;Rui Zhao;Liwei Wu;Tianpeng Bao;Zhenyu He
{"title":"ZeroPose: CAD-Prompted Zero-Shot Object 6D Pose Estimation in Cluttered Scenes","authors":"Jianqiu Chen;Zikun Zhou;Mingshan Sun;Rui Zhao;Liwei Wu;Tianpeng Bao;Zhenyu He","doi":"10.1109/TCSVT.2024.3482439","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3482439","url":null,"abstract":"Many robotics and industry applications have a high demand for the capability to estimate the 6D pose of novel objects from the cluttered scene. However, existing classic pose estimation methods are object-specific, which can only handle the specific objects seen during training. When applied to a novel object, these methods necessitate a cumbersome onboarding process, which involves extensive dataset preparation and model retraining. The extensive duration and resource consumption of onboarding limit their practicality in real-world applications In this paper, we introduce ZeroPose, a novel zero-shot framework that performs pose estimation following a Discovery-Orientation-Registration (DOR) inference pipeline. This framework generalizes to novel objects without requiring model retraining. Given the CAD model of a novel object, ZeroPose enables in seconds onboarding time to extract visual and geometric embeddings from the CAD model as a prompt. With the prompting of the above embeddings, DOR can discover all related instances and estimate their 6D poses without additional human interaction or presupposing scene conditions. Compared with existing zero-shot methods solved by the render-and-compare paradigm, the DOR pipeline formulates the object pose estimation into a feature-matching problem, which avoids time-consuming online rendering and improves efficiency. Experimental results on the seven datasets show that ZeroPose as a zero-shot method achieves comparable performance with object-specific training methods and outperforms the state-of-the-art zero-shot method with 50x inference speed improvement.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1251-1264"},"PeriodicalIF":8.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FDCE-Net: Underwater Image Enhancement With Embedding Frequency and Dual Color Encoder
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-17 DOI: 10.1109/TCSVT.2024.3482548
Zheng Cheng;Guodong Fan;Jingchun Zhou;Min Gan;C. L. Philip Chen
{"title":"FDCE-Net: Underwater Image Enhancement With Embedding Frequency and Dual Color Encoder","authors":"Zheng Cheng;Guodong Fan;Jingchun Zhou;Min Gan;C. L. Philip Chen","doi":"10.1109/TCSVT.2024.3482548","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3482548","url":null,"abstract":"Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: 1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately; 2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors. Extensive experiments demonstrate that our FDCE-Net outperforms state-of-the-art (SOTA) methods in terms of both visual quality and quantitative metrics. The code of our model is publicly available at: <uri>https://github.com/Alexande-rChan/FDCE-Net</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1728-1744"},"PeriodicalIF":8.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inter-Clip Feature Similarity Based Weakly Supervised Video Anomaly Detection via Multi-Scale Temporal MLP
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-17 DOI: 10.1109/TCSVT.2024.3482414
Yuanhong Zhong;Ruyue Zhu;Ge Yan;Ping Gan;Xuerui Shen;Dong Zhu
{"title":"Inter-Clip Feature Similarity Based Weakly Supervised Video Anomaly Detection via Multi-Scale Temporal MLP","authors":"Yuanhong Zhong;Ruyue Zhu;Ge Yan;Ping Gan;Xuerui Shen;Dong Zhu","doi":"10.1109/TCSVT.2024.3482414","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3482414","url":null,"abstract":"The major paradigm of weakly supervised video anomaly detection (WSVAD) is treating it as a multiple instance learning (MIL) problem, with only video-level labels available for training. Due to the rarity and ambiguity of anomaly, the selection of potential abnormal training sample is the prime challenge for WSVAD. Considering the temporal relevance and length variation of anomaly events, how to integrate the temporal information is also a controversial topic in WSVAD area. To address forementioned problems, we propose a novel method named Inter-clip Feature Similarity based Video Anomaly Detection (IFS-VAD). In the proposed IFS-VAD, to make use of both the global and local temporal relation, a Multi-scale Temporal MLP (MT-MLP) is leveraged. To better capture the ambiguous abnormal instances in positive bags, we introduce a novel anomaly criterion based on the Inter-clip Feature Similarity (IFS). The proposed IFS criterion can assist in discerning anomaly, as an additional anomaly score in the prediction process of anomaly classifier. Extensive experiments show that IFS-VAD demonstrates state-of-the-art performance on ShanghaiTech with an AUC of 97.95%, UCF-Crime with an AUC of 86.57% and XD-Violence with an AP of 83.14%. Our code implementation is accessible at <uri>https://github.com/Ria5331/IFS-VAD</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1961-1970"},"PeriodicalIF":8.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple Pedestrian Tracking Under Occlusion: A Survey and Outlook
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-16 DOI: 10.1109/TCSVT.2024.3481425
Zhihong Sun;Guoheng Wei;Wei Fu;Mang Ye;Kui Jiang;Chao Liang;Tingting Zhu;Tao He;Mithun Mukherjee
{"title":"Multiple Pedestrian Tracking Under Occlusion: A Survey and Outlook","authors":"Zhihong Sun;Guoheng Wei;Wei Fu;Mang Ye;Kui Jiang;Chao Liang;Tingting Zhu;Tao He;Mithun Mukherjee","doi":"10.1109/TCSVT.2024.3481425","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3481425","url":null,"abstract":"As an intermediate task in computer vision, multiple pedestrian tracking (MPT) aiming at tracking the pedestrians from a given video, has attracted attention due to its potential academic and commercial value. However, pedestrians commonly suffer from occlusion due to diverse and complex scenarios, which increases the challenge of this task. This survey provides comprehensive review in terms of occlusion scenarios encountered during MPT, and investigates the model robustness of the existing methods in this scenarios. Firstly, this survey introduces the various and states of occlusion. Secondly, the related occlusion datasets are introduced. Subsequently, we categorize existing occlusion handling methods according to the tracking process and detail their pros and cons. In addition, occlusion handling precision (OHP) metric is proposed to evaluate the ability of a tracker in handling occlusion in this survey. Moreover, comprehensive analyzes and discussions in several public datasets are provided to verify the effectiveness of these methods. Finally, the existing issues and future directions for occlusion handling methods are discussed. In doing so, this work serves as a foundation for future research by providing researchers with information about the occlusion handling method of MPT.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1009-1027"},"PeriodicalIF":8.3,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10720185","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-15 DOI: 10.1109/TCSVT.2024.3480949
Zheng Li;Caili Guo;Xin Wang;Zerun Feng;Zhongtian Du
{"title":"Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching","authors":"Zheng Li;Caili Guo;Xin Wang;Zerun Feng;Zhongtian Du","doi":"10.1109/TCSVT.2024.3480949","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3480949","url":null,"abstract":"Most Image-Text Matching (ITM) models adopt Triplet loss with Hard Negative mining (T-HN) as the optimization objective. T-HN mines the hardest negative samples in each batch for training and achieves impressive performance. However, we observe that these ITM models have bad training behaviors in the early phases of training. Model training is difficult to converge, and matching performance is slow to improve. In this paper, we find that the cause of bad training behavior is that the model suffers from gradient vanishing. Optimizing an ITM model using only the hardest negative samples can easily lead to gradient vanishing. Through gradient analysis, we first derive the condition under which the gradient vanishes during training. We explain why the gradient tends to zero under certain conditions. To alleviate gradient vanishing, we propose a Triplet loss with Selectively Hard Negative mining (T-SelHN), which decides whether to mine the hardest negative samples according to the gradient vanishing condition. T-SelHN can be applied to ITM models in a plug-and-play manner to improve their training behaviors. To further ensure the back-propagation of gradients, we construct a Residual Visual Semantic Embedding model with T-SelHN, denoted RVSE++, which has a simple network structure and efficient training and inference speeds. Extensive experiments on two ITM benchmarks demonstrate the strength of RVSE++, achieving state-of-the-art performance. The code is available at <uri>https://github.com/AAA-Zheng/RVSEPP</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1921-1935"},"PeriodicalIF":8.3,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Ensemble Learning With Category-Aware Attention and Local Contrastive Loss
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-15 DOI: 10.1109/TCSVT.2024.3479313
Hongrui Guo;Tianqi Sun;Hongzhi Liu;Zhonghai Wu
{"title":"Adaptive Ensemble Learning With Category-Aware Attention and Local Contrastive Loss","authors":"Hongrui Guo;Tianqi Sun;Hongzhi Liu;Zhonghai Wu","doi":"10.1109/TCSVT.2024.3479313","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3479313","url":null,"abstract":"Machine learning techniques can help us deal with many difficult problems in the real world. Proper ensemble of multiple learners can improve the predictive performance. Each base learner usually has different predictive ability on different instances or in different instance regions. However, existing ensemble methods often assume that base learners have the same predictive ability for all instances without consideration of the specificity of different instances or categories. To address these issues, we propose an adaptive ensemble learning framework with category-aware attention and local contrastive loss, which can adaptively adjust the ensemble weight of each base classifier according to the characteristics of each instance. Specifically, we design a category-aware attention mechanism to learn the predictive ability of each classifier on different categories. Furthermore, we design a local contrastive loss to capture local similarities between instances and further enhance the model’s ability to discern fine-grained patterns in the data. Extensive experiments on 20 public datasets demonstrate the effectiveness of the proposed model.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1224-1236"},"PeriodicalIF":8.3,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency Decoupled Domain-Irrelevant Feature Learning for Pan-Sharpening
IF 8.3 1区 工程技术
IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-10-15 DOI: 10.1109/TCSVT.2024.3480950
Jie Zhang;Ke Cao;Keyu Yan;Yunlong Lin;Xuanhua He;Yingying Wang;Rui Li;Chengjun Xie;Jun Zhang;Man Zhou
{"title":"Frequency Decoupled Domain-Irrelevant Feature Learning for Pan-Sharpening","authors":"Jie Zhang;Ke Cao;Keyu Yan;Yunlong Lin;Xuanhua He;Yingying Wang;Rui Li;Chengjun Xie;Jun Zhang;Man Zhou","doi":"10.1109/TCSVT.2024.3480950","DOIUrl":"https://doi.org/10.1109/TCSVT.2024.3480950","url":null,"abstract":"Pan-sharpening aims to generate high-detail multi-spectral images (HRMS) through the fusion of panchromatic (PAN) and multi-spectral (MS) images. However, existing pan-sharpening methods often suffer from significant performance degradation when dealing with out-of-distribution data, as they assume the training and test datasets are independent and identically distributed. To overcome this challenge, we propose a novel frequency domain-irrelevant feature learning framework that exhibits exceptional generalization capabilities. Our approach involves parallel extraction and processing of domain-irrelevant information from the amplitude and phase components of the input images. Specifically, we design a frequency information separation module to extract the amplitude and phase components of the paired images. The learnable high-pass filter is then employed to eliminate domain-specific information from the amplitude spectrums. After that, we devised two specialized sub-networks (AFL-Net and PFL-Net) to perform targeted learning of the frequency domain-irrelevant information. This allows our method to effectively capture the complementary domain-irrelevant information contained in the amplitude and phase spectra of the images. Finally, the information fusion and restoration module dynamically adjusts the feature channel weights, enabling the network to output high-quality HRMS images. Through this frequency domain-irrelevant feature learning framework, our method balances generalization capability and network performance on the distribution of training dataset. Extensive experiments conducted on various satellite datasets demonstrate the effectiveness of our method for generalized pan-sharpening. Our proposed network outperforms state-of-the-art methods in terms of both quantitative metrics and visual quality, showcasing its superior ability to handle diverse, out-of-distribution data.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 2","pages":"1237-1250"},"PeriodicalIF":8.3,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信