Displays最新文献

筛选
英文 中文
Review on SLAM algorithms for Augmented Reality 增强现实 SLAM 算法综述
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-31 DOI: 10.1016/j.displa.2024.102806
Xingdong Sheng , Shijie Mao , Yichao Yan , Xiaokang Yang
{"title":"Review on SLAM algorithms for Augmented Reality","authors":"Xingdong Sheng ,&nbsp;Shijie Mao ,&nbsp;Yichao Yan ,&nbsp;Xiaokang Yang","doi":"10.1016/j.displa.2024.102806","DOIUrl":"10.1016/j.displa.2024.102806","url":null,"abstract":"<div><p>Augmented Reality (AR) has gained significant attention in recent years as a technology that enhances the user’s perception and interaction with the real world by overlaying virtual objects. Simultaneous Localization and Mapping (SLAM) algorithm plays a crucial role in enabling AR applications by allowing the device to understand its position and orientation in the real world while mapping the environment. This paper first summarizes AR products and SLAM algorithms in recent years, and presents a comprehensive overview of SLAM algorithms including feature-based method, direct method, and deep learning-based method, highlighting their advantages and limitations. Then provides an in-depth exploration of classical SLAM algorithms for AR, with a focus on visual SLAM and visual-inertial SLAM. Lastly, sensor configuration, datasets, and performance evaluation for AR SLAM are also discussed. The review concludes with a summary of the current state of SLAM algorithms for AR and provides insights into future directions for research and development in this field. Overall, this review serves as a valuable resource for researchers and engineers who are interested in understanding the advancements and challenges in SLAM algorithms for AR.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102806"},"PeriodicalIF":3.7,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-resolution enhanced cross-subspace fusion network for light field image superresolution 用于光场图像超分辨率的高分辨率增强型跨子空间融合网络
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-29 DOI: 10.1016/j.displa.2024.102803
Shixu Ying , Shubo Zhou , Xue-Qin Jiang , Yongbin Gao , Feng Pan , Zhijun Fang
{"title":"High-resolution enhanced cross-subspace fusion network for light field image superresolution","authors":"Shixu Ying ,&nbsp;Shubo Zhou ,&nbsp;Xue-Qin Jiang ,&nbsp;Yongbin Gao ,&nbsp;Feng Pan ,&nbsp;Zhijun Fang","doi":"10.1016/j.displa.2024.102803","DOIUrl":"10.1016/j.displa.2024.102803","url":null,"abstract":"<div><p>Light field (LF) images offer abundant spatial and angular information, therefore, the combination of which is beneficial in the performance of LF image superresolution (LF image SR). Currently, existing methods often decompose the 4D LF data into low-dimensional subspaces for individual feature extraction and fusion for LF image SR. However, the performance of these methods is restricted because of lacking effective correlations between subspaces and missing out on crucial complementary information for capturing rich texture details. To address this, we propose a cross-subspace fusion network for LF spatial SR (i.e., CSFNet). Specifically, we design the progressive cross-subspace fusion module (PCSFM), which can progressively establish cross-subspace correlations based on a cross-attention mechanism to comprehensively enrich LF information. Additionally, we propose a high-resolution adaptive enhancement group (HR-AEG), which preserves the texture and edge details in the high resolution feature domain by employing a multibranch enhancement method and an adaptive weight strategy. The experimental results demonstrate that our approach achieves highly competitive performance on multiple LF datasets compared to state-of-the-art (SOTA) methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102803"},"PeriodicalIF":3.7,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dense video caption dataset of student classroom behaviors and a baseline model with boundary semantic awareness 学生课堂行为的密集视频字幕数据集和具有边界语义意识的基线模型
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-26 DOI: 10.1016/j.displa.2024.102804
Yong Wu , Jinyu Tian , HuiJun Liu , Yuanyan Tang
{"title":"A dense video caption dataset of student classroom behaviors and a baseline model with boundary semantic awareness","authors":"Yong Wu ,&nbsp;Jinyu Tian ,&nbsp;HuiJun Liu ,&nbsp;Yuanyan Tang","doi":"10.1016/j.displa.2024.102804","DOIUrl":"10.1016/j.displa.2024.102804","url":null,"abstract":"<div><p>Dense video captioning automatically locates events in untrimmed videos and describes event contents through natural language. This task has many potential applications, including security, assisting people who are visually impaired, and video retrieval. The related datasets constitute an important foundation for research on data-driven methods. However, the existing models for building dense video caption datasets were designed for the universal domain, often ignoring the characteristics and requirements of a specific domain. In addition, the one-way dataset construction process cannot form a closed-loop iterative scheme to improve the quality of the dataset. Therefore, this paper proposes a novel dataset construction model that is suitable for classroom-specific scenarios. On this basis, the Dense Video Caption Dataset of Student Classroom Behaviors (SCB-DVC) is constructed. Additionally, the existing dense video captioning methods typically utilize only temporal event boundaries as direct supervisory information during localization and fail to consider semantic information. This results in a limited correlation between the localization and captioning stages. This defect makes it more difficult to locate events in videos with oversmooth boundaries (due to the excessive similarity between the foregrounds and backgrounds (temporal domains) of events). Therefore, we propose a fine-grained semantic-aware assisted boundary localization-based dense video captioning method. This method enhances the ability to effectively learn the differential features between the foreground and background of an event by introducing semantic-aware information. It can provide increased boundary perception and achieve more accurate captions. Experimental results show that the proposed method performs well on both the SCB-DVC dataset and public datasets (ActivityNet Captions, YouCook2 and TACoS). We will release the SCB-DVC dataset soon.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102804"},"PeriodicalIF":3.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141959363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLIP2TF:Multimodal video–text retrieval for adolescent education CLIP2TF:面向青少年教育的多模态视频-文本检索
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-25 DOI: 10.1016/j.displa.2024.102801
Xiaoning Sun, Tao Fan, Hongxu Li, Guozhong Wang, Peien Ge, Xiwu Shang
{"title":"CLIP2TF:Multimodal video–text retrieval for adolescent education","authors":"Xiaoning Sun,&nbsp;Tao Fan,&nbsp;Hongxu Li,&nbsp;Guozhong Wang,&nbsp;Peien Ge,&nbsp;Xiwu Shang","doi":"10.1016/j.displa.2024.102801","DOIUrl":"10.1016/j.displa.2024.102801","url":null,"abstract":"<div><p>With the rapid advancement of artificial intelligence technology, particularly within the sphere of adolescent education, a continual emergence of new challenges and opportunities is observed. The current educational system increasingly requires the automation of teaching activities detection and evaluation, offering fresh perspectives for enhancing the quality of adolescent education. Although large-scale models are receiving significant attention in educational research, their high demand for computational resources and limitations in specific applications constrain their widespread use in analyzing educational video content, especially when handling multimodal data. Current multimodal contrastive learning methods, which integrate video, audio, and text information, have achieved certain successes in video–text retrieval tasks. However, these methods typically employ simpler weighted fusion strategies and fail to avoid noise and information redundancy. Therefore, our study proposes a novel network framework, CLIP2TF, which includes an efficient audio–visual fusion encoder. It aims to dynamically interact and integrate visual and audio features, further enhancing the visual features that may be missing or insufficient in specific teaching scenarios while effectively reducing redundant information transfer during the modality fusion process. Through ablation experiments on the MSRVTT and MSVD datasets, we first demonstrate the effectiveness of CLIP2TF in video–text retrieval tasks. Subsequent tests on teaching video datasets further proves the applicability of the proposed method. This research not only showcases the potential of artificial intelligence in the automated assessment of teaching quality but also provides new directions for research in related fields studies.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102801"},"PeriodicalIF":3.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor ICEAP:带有增强型属性预测器的高级细粒度图像字幕网络
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-24 DOI: 10.1016/j.displa.2024.102798
Md. Bipul Hossen, Zhongfu Ye, Amr Abdussalam, Mohammad Alamgir Hossain
{"title":"ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor","authors":"Md. Bipul Hossen,&nbsp;Zhongfu Ye,&nbsp;Amr Abdussalam,&nbsp;Mohammad Alamgir Hossain","doi":"10.1016/j.displa.2024.102798","DOIUrl":"10.1016/j.displa.2024.102798","url":null,"abstract":"<div><p>Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102798"},"PeriodicalIF":3.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141951620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DBMKA-Net:Dual branch multi-perception kernel adaptation for underwater image enhancement DBMKA-Net:用于水下图像增强的双分支多感知内核适配
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-22 DOI: 10.1016/j.displa.2024.102797
Hongjian Wang, Suting Chen
{"title":"DBMKA-Net:Dual branch multi-perception kernel adaptation for underwater image enhancement","authors":"Hongjian Wang,&nbsp;Suting Chen","doi":"10.1016/j.displa.2024.102797","DOIUrl":"10.1016/j.displa.2024.102797","url":null,"abstract":"<div><p>In recent years, due to the dependence on wavelength-based light absorption and scattering, underwater photographs captured by devices often exhibit characteristics such as blurriness, faded color tones, and low contrast. To address these challenges, convolutional neural networks (CNNs) with their robust feature-capturing capabilities and adaptable structures have been employed for underwater image enhancement. However, most CNN-based studies on underwater image enhancement have not taken into account color space kernel convolution adaptability, which can significantly enhance the model’s expressive capacity. Building upon current academic research on adjusting the color space size for each perceptual field, this paper introduces a Double-Branch Multi-Perception Kernel Adaptive (DBMKA) model. A DBMKA module is constructed through two perceptual branches that adapt the kernels: channel features and local image entropy. Additionally, considering the pronounced attenuation of the red channel in underwater images, a Dependency-Capturing Feature Jump Connection module (DCFJC) has been designed to capture the red channel’s dependence on the blue and green channels for compensation. Its skip mechanism effectively preserves color contextual information. To better utilize the extracted features for enhancing underwater images, a Cross-Level Attention Feature Fusion (CLAFF) module has been designed. With the Double-Branch Multi-Perception Kernel Adaptive model, Dependency-Capturing Skip Connection module, and Cross-Level Adaptive Feature Fusion module, this network can effectively enhance various types of underwater images. Qualitative and quantitative evaluations were conducted on the UIEB and EUVP datasets. In the color correction comparison experiments, our method demonstrated a more uniform red channel distribution across all gray levels, maintaining color consistency and naturalness. Regarding image information entropy (IIE) and average gradient (AG), the data confirmed our method’s superiority in preserving image details. Furthermore, our proposed method showed performance improvements exceeding 10% on other metrics like MSE and UCIQE, further validating its effectiveness and accuracy.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102797"},"PeriodicalIF":3.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141847192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-threshold image segmentation using new strategies enhanced whale optimization for lupus nephritis pathological images 针对狼疮性肾炎病理图像的多阈值图像分割新策略--增强鲸鱼优化法
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-20 DOI: 10.1016/j.displa.2024.102799
Jinge Shi , Yi Chen , Chaofan Wang , Ali Asghar Heidari , Lei Liu , Huiling Chen , Xiaowei Chen , Li Sun
{"title":"Multi-threshold image segmentation using new strategies enhanced whale optimization for lupus nephritis pathological images","authors":"Jinge Shi ,&nbsp;Yi Chen ,&nbsp;Chaofan Wang ,&nbsp;Ali Asghar Heidari ,&nbsp;Lei Liu ,&nbsp;Huiling Chen ,&nbsp;Xiaowei Chen ,&nbsp;Li Sun","doi":"10.1016/j.displa.2024.102799","DOIUrl":"10.1016/j.displa.2024.102799","url":null,"abstract":"<div><p>Lupus Nephritis (LN) has been considered as the most prevalent form of systemic lupus erythematosus. Medical imaging plays an important role in diagnosing and treating LN, which can help doctors accurately assess the extent and extent of the lesion. However, relying solely on visual observation and judgment can introduce subjectivity and errors, especially for complex pathological images. Image segmentation techniques are used to differentiate various tissues and structures in medical images to assist doctors in diagnosis. Multi-threshold Image Segmentation (MIS) has gained widespread recognition for its direct and practical application. However, existing MIS methods still have some issues. Therefore, this study combines non-local means, 2D histogram, and 2D Renyi’s entropy to improve the performance of MIS methods. Additionally, this study introduces an improved variant of the Whale Optimization Algorithm (GTMWOA) to optimize the aforementioned MIS methods and reduce algorithm complexity. The GTMWOA fusions Gaussian Exploration (GE), Topology Mapping (TM), and Magnetic Liquid Climbing (MLC). The GE effectively amplifies the algorithm’s proficiency in local exploration and quickens the convergence rate. The TM facilitates the algorithm in escaping local optima, while the MLC mechanism emulates the physical phenomenon of MLC, refining the algorithm’s convergence precision. This study conducted an extensive series of tests using the IEEE CEC 2017 benchmark functions to demonstrate the superior performance of GTMWOA in addressing intricate optimization problems. Furthermore, this study executed an experiment using Berkeley images and LN images to verify the superiority of GTMWOA in MIS. The ultimate outcomes of the MIS experiments substantiate the algorithm’s advanced capabilities and robustness in handling complex optimization problems.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102799"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified architecture for super-resolution and segmentation of remote sensing images based on similarity feature fusion 基于相似性特征融合的遥感图像超分辨率和分割统一架构
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-20 DOI: 10.1016/j.displa.2024.102800
Lunqian Wang , Xinghua Wang , Weilin Liu , Hao Ding , Bo Xia , Zekai Zhang , Jinglin Zhang , Sen Xu
{"title":"A unified architecture for super-resolution and segmentation of remote sensing images based on similarity feature fusion","authors":"Lunqian Wang ,&nbsp;Xinghua Wang ,&nbsp;Weilin Liu ,&nbsp;Hao Ding ,&nbsp;Bo Xia ,&nbsp;Zekai Zhang ,&nbsp;Jinglin Zhang ,&nbsp;Sen Xu","doi":"10.1016/j.displa.2024.102800","DOIUrl":"10.1016/j.displa.2024.102800","url":null,"abstract":"<div><p>The resolution of the image has an important impact on the accuracy of segmentation. Integrating super-resolution (SR) techniques in the semantic segmentation of remote sensing images contributes to the improvement of precision and accuracy, especially when the images are blurred. In this paper, a novel and efficient SR semantic segmentation network (SRSEN) is designed by taking advantage of the similarity between SR and segmentation tasks in feature processing. SRSEN consists of the multi-scale feature encoder, the SR fusion decoder, and the multi-path feature refinement block, which adaptively establishes the feature associations between segmentation and SR tasks to improve the segmentation accuracy of blurred images. Experiments show that the proposed method achieves higher segmentation accuracy on fuzzy images compared to state-of-the-art models. Specifically, the mIoU of the proposed SRSEN is 3%–6% higher than other state-of-the-art models on low-resolution LoveDa, Vaihingen, and Potsdam datasets.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102800"},"PeriodicalIF":3.7,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiF-DETR:Remote sensing object detection based on Bidirectional information fusion BiF-DETR:基于双向信息融合的遥感物体探测
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-19 DOI: 10.1016/j.displa.2024.102802
Zhijing Xu, Chao Wang, Kan Huang
{"title":"BiF-DETR:Remote sensing object detection based on Bidirectional information fusion","authors":"Zhijing Xu,&nbsp;Chao Wang,&nbsp;Kan Huang","doi":"10.1016/j.displa.2024.102802","DOIUrl":"10.1016/j.displa.2024.102802","url":null,"abstract":"<div><p>Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with <em>m</em>AP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102802"},"PeriodicalIF":3.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001665/pdfft?md5=e3ed1b94823f012220f1a30a72ed7985&pid=1-s2.0-S0141938224001665-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141736394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSNet: A dual-domain network for few-shot image classification FSNet:用于少量图像分类的双域网络
IF 3.7 2区 工程技术
Displays Pub Date : 2024-07-14 DOI: 10.1016/j.displa.2024.102795
Xuewen Yan, Zhangjin Huang
{"title":"FSNet: A dual-domain network for few-shot image classification","authors":"Xuewen Yan,&nbsp;Zhangjin Huang","doi":"10.1016/j.displa.2024.102795","DOIUrl":"10.1016/j.displa.2024.102795","url":null,"abstract":"<div><p>Few-shot learning is a challenging task, that aims to learn and identify novel classes from a limited number of unseen labeled samples. Previous work has focused primarily on extracting features solely in the spatial domain of images. However, the compressed representation in the frequency domain which contains rich pattern information is a powerful tool in the field of signal processing. Combining the frequency and spatial domains to obtain richer information can effectively alleviate the overfitting problem. In this paper, we propose a dual-domain combined model called Frequency Space Net (FSNet), which preprocesses input images simultaneously in both the spatial and frequency domains, extracts spatial and frequency information through two feature extractors, and fuses them to a composite feature for image classification tasks. We start from a different view of frequency analysis, linking conventional average pooling to Discrete Cosine Transformation (DCT). We generalize the compression of the attention mechanism in the frequency domain. Consequently, we propose a novel Frequency Channel Spatial (FCS) attention mechanism. Extensive experiments demonstrate that frequency and spatial information are complementary in few-shot image classification, improving the performance of the model. Our method outperforms state-of-the-art approaches on miniImageNet and CUB.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102795"},"PeriodicalIF":3.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信