IET Computer Vision最新文献

筛选
英文 中文
A comprehensive research on light field imaging: Theory and application 光场成像的综合研究:理论与应用
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-11-22 DOI: 10.1049/cvi2.12321
Fei Liu, Yunlong Wang, Qing Yang, Shubo Zhou, Kunbo Zhang
{"title":"A comprehensive research on light field imaging: Theory and application","authors":"Fei Liu,&nbsp;Yunlong Wang,&nbsp;Qing Yang,&nbsp;Shubo Zhou,&nbsp;Kunbo Zhang","doi":"10.1049/cvi2.12321","DOIUrl":"10.1049/cvi2.12321","url":null,"abstract":"<p>Computational photography is a combination of novel optical designs and processing methods to capture high-dimensional visual information. As an emerged promising technique, light field (LF) imaging measures the lighting, reflectance, focus, geometry and viewpoint in the free space, which has been widely explored for depth estimation, view synthesis, refocus, rendering, 3D displays, microscopy and other applications in computer vision in the past decades. In this paper, the authors present a comprehensive research survey on the LF imaging theory, technology and application. Firstly, the LF imaging process based on a MicroLens Array structure is derived, that is MLA-LF. Subsequently, the innovations of LF imaging technology are presented in terms of the imaging prototype, consumer LF camera and LF displays in Virtual Reality (VR) and Augmented Reality (AR). Finally the applications and challenges of LF imaging integrating with deep learning models are analysed, which consist of depth estimation, saliency detection, semantic segmentation, de-occlusion and defocus deblurring in recent years. It is believed that this paper will be a good reference for the future research on LF imaging technology in Artificial Intelligence era.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1269-1284"},"PeriodicalIF":1.3,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEUFormer: High-precision semantic segmentation for urban remote sensing images DEUFormer:高精度城市遥感图像语义分割
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-11-12 DOI: 10.1049/cvi2.12313
Xinqi Jia, Xiaoyong Song, Lei Rao, Guangyu Fan, Songlin Cheng, Niansheng Chen
{"title":"DEUFormer: High-precision semantic segmentation for urban remote sensing images","authors":"Xinqi Jia,&nbsp;Xiaoyong Song,&nbsp;Lei Rao,&nbsp;Guangyu Fan,&nbsp;Songlin Cheng,&nbsp;Niansheng Chen","doi":"10.1049/cvi2.12313","DOIUrl":"10.1049/cvi2.12313","url":null,"abstract":"<p>Urban remote sensing image semantic segmentation has a wide range of applications, such as urban planning, resource exploration, intelligent transportation, and other scenarios. Although UNetFormer performs well by introducing the self-attention mechanism of Transformer, it still faces challenges arising from relatively low segmentation accuracy and significant edge segmentation errors. To this end, this paper proposes DEUFormer by employing a special weighted sum method to fuse the features of the encoder and the decoder, thus capturing both local details and global context information. Moreover, an Enhanced Feature Refinement Head is designed to finely re-weight features on the channel dimension and narrow the semantic gap between shallow and deep features, thereby enhancing multi-scale feature extraction. Additionally, an Edge-Guided Context Module is introduced to enhance edge areas through effective edge detection, which can improve edge information extraction. Experimental results show that DEUFormer achieves an average Mean Intersection over Union (mIoU) of 53.8% on the LoveDA dataset and 69.1% on the UAVid dataset. Notably, the mIoU of buildings in the LoveDA dataset is 5.0% higher than that of UNetFormer. The proposed model outperforms methods such as UNetFormer on multiple datasets, which demonstrates its effectiveness.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1209-1222"},"PeriodicalIF":1.3,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient transformer tracking with adaptive attention 具有自适应注意力的高效变压器跟踪
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-11-07 DOI: 10.1049/cvi2.12315
Dingkun Xiao, Zhenzhong Wei, Guangjun Zhang
{"title":"Efficient transformer tracking with adaptive attention","authors":"Dingkun Xiao,&nbsp;Zhenzhong Wei,&nbsp;Guangjun Zhang","doi":"10.1049/cvi2.12315","DOIUrl":"10.1049/cvi2.12315","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <p>Recently, several trackers utilising Transformer architecture have shown significant performance improvement. However, the high computational cost of multi-head attention, a core component in the Transformer, has limited real-time running speed, which is crucial for tracking tasks. Additionally, the global mechanism of multi-head attention makes it susceptible to distractors with similar semantic information to the target. To address these issues, the authors propose a novel adaptive attention that enhances features through the spatial sparse attention mechanism with less than 1/4 of the computational complexity of multi-head attention. Our adaptive attention sets a perception range around each element in the feature map based on the target scale in the previous tracking result and adaptively searches for the information of interest. This allows the module to focus on the target region rather than background distractors. Based on adaptive attention, the authors build an efficient transformer tracking framework. It can perform deep interaction between search and template features to activate target information and aggregate multi-level interaction features to enhance the representation ability. The evaluation results on seven benchmarks show that the authors’ tracker achieves outstanding performance with a speed of 43 fps and significant advantages in hard circumstances.</p>\u0000 </section>\u0000 </div>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1338-1350"},"PeriodicalIF":1.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale feature extraction for energy-efficient object detection in remote sensing images 面向高能效遥感图像目标检测的多尺度特征提取
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-10-30 DOI: 10.1049/cvi2.12317
Di Wu, Hongning Liu, Jiawei Xu, Fei Xie
{"title":"Multi-scale feature extraction for energy-efficient object detection in remote sensing images","authors":"Di Wu,&nbsp;Hongning Liu,&nbsp;Jiawei Xu,&nbsp;Fei Xie","doi":"10.1049/cvi2.12317","DOIUrl":"10.1049/cvi2.12317","url":null,"abstract":"<p>Object detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one-stage detectors, for example, the Real-Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi-Scale Feature Extraction-assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC-FPN, which enhances the model's ability to capture global and multi-scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly-scale convolution (PSConv), respectively. The experiment in DIOR-R demonstrated that MRTMDet achieves competitive performance of 62.2% mAP, balancing precision with a lightweight design.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1223-1234"},"PeriodicalIF":1.3,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12317","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on person and vehicle re-identification 人车再识别调查
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-10-28 DOI: 10.1049/cvi2.12316
Zhaofa Wang, Liyang Wang, Zhiping Shi, Miaomiao Zhang, Qichuan Geng, Na Jiang
{"title":"A survey on person and vehicle re-identification","authors":"Zhaofa Wang,&nbsp;Liyang Wang,&nbsp;Zhiping Shi,&nbsp;Miaomiao Zhang,&nbsp;Qichuan Geng,&nbsp;Na Jiang","doi":"10.1049/cvi2.12316","DOIUrl":"10.1049/cvi2.12316","url":null,"abstract":"<p>Person/vehicle re-identification aims to use technologies such as cross-camera retrieval to associate the same person (same vehicle) in the surveillance videos at different locations, different times, and images captured by different cameras so as to achieve cross-surveillance image matching, person retrieval and trajectory tracking. It plays an extremely important role in the fields of intelligent security, criminal investigation etc. In recent years, the rapid development of deep learning technology has significantly propelled the advancement of re-identification (Re-ID) technology. An increasing number of technical methods have emerged, aiming to enhance Re-ID performance. This paper summarises four popular research areas in the current field of re-identification, focusing on the current research hotspots. These areas include the multi-task learning domain, the generalisation learning domain, the cross-modality domain, and the optimisation learning domain. Specifically, the paper analyses various challenges faced within these domains and elaborates on different deep learning frameworks and networks that address these challenges. A comparative analysis of re-identification tasks from various classification perspectives is provided, introducing mainstream research directions and current achievements. Finally, insights into future development trends are presented.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1235-1268"},"PeriodicalIF":1.3,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Occluded object 6D pose estimation using foreground probability compensation 前景概率补偿被遮挡物体6D姿态估计
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-10-17 DOI: 10.1049/cvi2.12314
Meihui Ren, Junying Jia, Xin Lu
{"title":"Occluded object 6D pose estimation using foreground probability compensation","authors":"Meihui Ren,&nbsp;Junying Jia,&nbsp;Xin Lu","doi":"10.1049/cvi2.12314","DOIUrl":"10.1049/cvi2.12314","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <p>6D object pose estimation usually refers to acquiring the 6D pose information of 3D objects in the sensor coordinate system using computer vision techniques. However, the task faces numerous challenges due to the complexity of natural scenes. One of the most significant challenges is occlusion, which is an unavoidable situation in 3D scenes and poses a significant obstacle in real-world applications. To tackle this issue, we propose a novel 6D pose estimation algorithm based on RGB-D images, aiming for enhanced robustness in occluded environments. Our approach follows the basic architecture of keypoint-based pose estimation algorithms. To better leverage complementary information of RGB-D data, we introduce a novel foreground probability-guided sampling strategy at the network's input stage. This strategy mitigates the sampling ratio imbalance between foreground and background points due to smaller foreground objects in occluded environments. Moreover, considering the impact of occlusion on semantic segmentation networks, we introduce a new object segmentation module. This module utilises traditional image processing techniques to compensate for severe semantic segmentation errors of deep learning networks. We evaluate our algorithm using the Occlusion LineMOD public dataset. Experimental results demonstrate that our method is more robust in occlusion environments compared to existing state-of-the-art algorithms. It maintains stable performance even in scenarios with no or low occlusion.</p>\u0000 </section>\u0000 </div>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1325-1337"},"PeriodicalIF":1.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12314","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time semantic segmentation network for crops and weeds based on multi-branch structure 基于多分支结构的农作物和杂草实时语义分割网络
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-10-01 DOI: 10.1049/cvi2.12311
Yufan Liu, Muhua Liu, Xuhui Zhao, Junlong Zhu, Lin Wang, Hao Ma, Mingchuan Zhang
{"title":"Real-time semantic segmentation network for crops and weeds based on multi-branch structure","authors":"Yufan Liu,&nbsp;Muhua Liu,&nbsp;Xuhui Zhao,&nbsp;Junlong Zhu,&nbsp;Lin Wang,&nbsp;Hao Ma,&nbsp;Mingchuan Zhang","doi":"10.1049/cvi2.12311","DOIUrl":"10.1049/cvi2.12311","url":null,"abstract":"<p>Weed recognition is an inevitable problem in smart agriculture, and to realise efficient weed recognition, complex background, insufficient feature information, varying target sizes and overlapping crops and weeds are the main problems to be solved. To address these problems, the authors propose a real-time semantic segmentation network based on a multi-branch structure for recognising crops and weeds. First, a new backbone network for capturing feature information between crops and weeds of different sizes is constructed. Second, the authors propose a weight refinement fusion (WRF) module to enhance the feature extraction ability of crops and weeds and reduce the interference caused by the complex background. Finally, a Semantic Guided Fusion is devised to enhance the interaction of information between crops and weeds and reduce the interference caused by overlapping goals. The experimental results demonstrate that the proposed network can balance speed and accuracy. Specifically, the 0.713 Mean IoU (MIoU), 0.802 MIoU, 0.746 MIoU and 0.906 MIoU can be achieved on the sugar beet (BoniRob) dataset, synthetic BoniRob dataset, CWFID dataset and self-labelled wheat dataset, respectively.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1313-1324"},"PeriodicalIF":1.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12311","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging modality-specific and shared features for RGB-T salient object detection 利用特定于模态的共享特性进行RGB-T显著对象检测
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-09-25 DOI: 10.1049/cvi2.12307
Shuo Wang, Gang Yang, Qiqi Xu, Xun Dai
{"title":"Leveraging modality-specific and shared features for RGB-T salient object detection","authors":"Shuo Wang,&nbsp;Gang Yang,&nbsp;Qiqi Xu,&nbsp;Xun Dai","doi":"10.1049/cvi2.12307","DOIUrl":"10.1049/cvi2.12307","url":null,"abstract":"<p>Most of the existing RGB-T salient object detection methods are usually based on dual-stream encoding single-stream decoding network architecture. These models always rely on the quality of fusion features, which often focus on modality-shared features and overlook modality-specific features, thus failing to fully utilise the rich information contained in multi-modality data. To this end, a modality separate tri-stream net (MSTNet), which consists of a tri-stream encoding (TSE) structure and a tri-stream decoding (TSD) structure is proposed. The TSE explicitly separates and extracts the modality-shared and modality-specific features to improve the utilisation of multi-modality data. In addition, based on the hybrid-attention and cross-attention mechanism, we design an enhanced complementary fusion module (ECF), which fully considers the complementarity between the features to be fused and realises high-quality feature fusion. Furthermore, in TSD, the quality of uni-modality features is ensured under the constraint of supervision. Finally, to make full use of the rich multi-level and multi-scale decoding features contained in TSD, the authors design the adaptive multi-scale decoding module and the multi-stream feature aggregation module to improve the decoding capability. Extensive experiments on three public datasets show that the MSTNet outperforms 14 state-of-the-art methods, demonstrating that this method can extract and utilise the multi-modality information more adequately and extract more complete and rich features, thus improving the model's performance. The code will be released at https://github.com/JOOOOKII/MSTNet.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1285-1299"},"PeriodicalIF":1.3,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12307","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPANet: Spatial perceptual activation network for camouflaged object detection 用于伪装目标检测的空间感知激活网络
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-09-18 DOI: 10.1049/cvi2.12310
Jianhao Zhang, Gang Yang, Xun Dai, Pengyu Yang
{"title":"SPANet: Spatial perceptual activation network for camouflaged object detection","authors":"Jianhao Zhang,&nbsp;Gang Yang,&nbsp;Xun Dai,&nbsp;Pengyu Yang","doi":"10.1049/cvi2.12310","DOIUrl":"10.1049/cvi2.12310","url":null,"abstract":"<p>Camouflaged object detection (COD) aims to segment objects embedded in the environment from the background. Most existing methods are easily affected by background interference in cluttered environments and cannot accurately locate camouflage areas, resulting in over-segmentation or incomplete segmentation structures. To effectively improve the performance of COD, we propose a spatial perceptual activation network (SPANet). SPANet extracts the spatial positional relationship between each object in the scene by activating spatial perception and uses it as global information to guide segmentation. It mainly consists of three modules: perceptual activation module (PAM), feature inference module (FIM), and interaction recovery module (IRM). Specifically, the authors design a PAM to model the positional relationship between the camouflaged object and the surrounding environment to obtain semantic correlation information. Then, a FIM that can effectively combine correlation information to suppress background interference and re-encode to generate multi-scale features is proposed. In addition, to further fuse multi-scale features, an IRM to mine the complementary information and differences between features at different scales is designed. Extensive experimental results on four widely used benchmark datasets (i.e. CAMO, CHAMELEON, COD10K, and NC4K) show that the authors’ method outperforms 13 state-of-the-art methods.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1300-1312"},"PeriodicalIF":1.3,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SRL-ProtoNet: Self-supervised representation learning for few-shot remote sensing scene classification SRL-ProtoNet:用于少镜头遥感场景分类的自监督表示学习
IF 1.3 4区 计算机科学
IET Computer Vision Pub Date : 2024-09-02 DOI: 10.1049/cvi2.12304
Bing Liu, Hongwei Zhao, Jiao Li, Yansheng Gao, Jianrong Zhang
{"title":"SRL-ProtoNet: Self-supervised representation learning for few-shot remote sensing scene classification","authors":"Bing Liu,&nbsp;Hongwei Zhao,&nbsp;Jiao Li,&nbsp;Yansheng Gao,&nbsp;Jianrong Zhang","doi":"10.1049/cvi2.12304","DOIUrl":"10.1049/cvi2.12304","url":null,"abstract":"<p>Using a deep learning method to classify a large amount of labelled remote sensing scene data produces good performance. However, it is challenging for deep learning based methods to generalise to classification tasks with limited data. Few-shot learning allows neural networks to classify unseen categories when confronted with a handful of labelled data. Currently, episodic tasks based on meta-learning can effectively complete few-shot classification, and training an encoder that can conduct representation learning has become an important component of few-shot learning. An end-to-end few-shot remote sensing scene classification model based on ProtoNet and self-supervised learning is proposed. The authors design the Pre-prototype for a more discrete feature space and better integration with self-supervised learning, and also propose the ProtoMixer for higher quality prototypes with a global receptive field. The authors’ method outperforms the existing state-of-the-art self-supervised based methods on three widely used benchmark datasets: UC-Merced, NWPU-RESISC45, and AID. Compare with previous state-of-the-art performance. For the one-shot setting, this method improves by 1.21%, 2.36%, and 0.84% in AID, UC-Merced, and NWPU-RESISC45, respectively. For the five-shot setting, this method surpasses by 0.85%, 2.79%, and 0.74% in the AID, UC-Merced, and NWPU-RESISC45, respectively.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"1034-1042"},"PeriodicalIF":1.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12304","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信