Delin Zhang , Guangxiang Yang , Guangling Yu , Baofeng Yang , Xiaoheng Wang
{"title":"ORB-SLAM3 and dense mapping algorithm based on improved feature matching","authors":"Delin Zhang , Guangxiang Yang , Guangling Yu , Baofeng Yang , Xiaoheng Wang","doi":"10.1016/j.image.2025.117322","DOIUrl":"10.1016/j.image.2025.117322","url":null,"abstract":"<div><div>ORB-SLAM3 is currently the mainstream visual SLAM system, which uses feature matching based on ORB keypoints. However, ORB-SLAM3 faces two main issues: Firstly, feature matching is time-consuming, and the insufficient number of feature point matches results in lower algorithmic localization accuracy. Secondly, it lacks the capability to construct dense point cloud maps, therefore limiting its applicability in high-demand scenarios such as path planning. To address these issues, this paper proposes an ORB-SLAM3 and dense mapping algorithm based on improved feature matching. In the feature matching process of ORB-SLAM3, motion smoothness constraints are introduced and the image is gridded. The feature points that are at the edge of the grid are divided into multiple adjacent grids to solve the problems, which are unable to correctly partition the feature points to the corresponding grid and algorithm time consumption. This reduces matched time and increases the number of matched pairs, improving the positioning accuracy of ORB-SLAM3. Moreover, a dense mapping construction thread has been added to construct dense point cloud maps in real-time using keyframes and corresponding poses filtered from the feature matching stage. Finally, simulation experiments were conducted using the TUM dataset for validation. The results demonstrate that the improved algorithm reduced feature matching time by 75.71 % compared to ORB-SLAM3, increased the number of feature point matches by 88.69 %, and improved localization accuracy by 9.44 %. Furthermore, the validation confirmed that the improved algorithm is capable of constructing dense maps in real-time. In conclusion, the improved algorithm demonstrates excellent performance in terms of localization accuracy and dense mapping.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117322"},"PeriodicalIF":3.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camouflaged instance segmentation based on multi-scale feature contour fusion swin transformer","authors":"Yin-Fu Huang, Feng-Yen Jen","doi":"10.1016/j.image.2025.117328","DOIUrl":"10.1016/j.image.2025.117328","url":null,"abstract":"<div><div>Camouflaged instance segmentation is the latest detection issue for finding hidden objects in an image. Since camouflaged objects hide with similar background colors, it is difficult to detect objects' existence. In this paper, we proposed an instance segmentation model called Multi-scale Feature Contour Fusion Swin Transformer (MFCFSwinT) consisting of seven modules; i.e., Swin Transformer as the backbone for feature extraction, Pyramid of Kernel with Dilation (PKD) and Multi-Feature Fusion (MFF) for multi-scale features, Contour Branch and Contour Feature Fusion (CFF) for feature fusion, and Region Proposal Network (RPN) and Cascade Head for bounding boxes and masks detection. In the experiments, four datasets are used to evaluate the proposed model; i.e., COCO (Common Objects in Context), LVIS v1.0 (Large Vocabulary Instance Segmentation), COD10K (Camouflaged Object Detection), and NC4K. Finally, the experimental results show that MFCFSwinT can achieve better performances than most state-of-the-art models.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117328"},"PeriodicalIF":3.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143826394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbo Zhang, Lulu Pan, Ke Xu, Guo Li, Yanheng Lv, Lingxiao Li, Le Lei
{"title":"Self-adaptive and learnable detail enhancement network for efficient image super resolution","authors":"Wenbo Zhang, Lulu Pan, Ke Xu, Guo Li, Yanheng Lv, Lingxiao Li, Le Lei","doi":"10.1016/j.image.2025.117319","DOIUrl":"10.1016/j.image.2025.117319","url":null,"abstract":"<div><div>In recent years, single image super-resolution (SISR) methods based on deep learning have advanced significantly. However, their high computational complexity and memory demands hinder deployment on resource-constrained devices. Although numerous lightweight super-resolution (SR) methods have been proposed to address this issue, most fail to distinguish between flat and detailed regions in images, treating them uniformly. This lack of targeted design for detailed regions, which are critical to SR performance, results in redundancy and inefficiency in existing lightweight methods. To address these challenges, we propose a simple yet effective network Self-adaptive and Learnable Detail Enhancement Network (LDEN) that specifically focuses on the reconstruction of detailed regions. Firstly, we present two designs for the reconstruction of detailed regions: (1) we design the Learnable Detail Extraction Block (LDEB), which can pay special attention to detailed regions and employ a larger convolution kernel in LDEB to obtain a larger receptive field; (2) we design a lightweight attention mechanism called Detail-oriented Spatial Attention (DSA) to enhance the network's ability to reconstruct detailed regions. Secondly, we design a hierarchical refinement mechanism named Efficient Hierarchical Refinement Block (EHRB) which can reduce the inadequate information extraction and integration caused by rough single-layer refinement. Extensive experiments demonstrate that LDEN achieves state-of-the-art performance on all benchmark datasets. Notably, for 4 × magnification tasks, LDEN outperforms BSRN - the champion of the model complexity track of NTIRE 2022 Efficient SR Challenge - by achieving gains of 0.11 dB and 0.12 dB while reducing parameters by nearly 10 %.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117319"},"PeriodicalIF":3.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal style aggregation network for art image classification","authors":"Quan Wang, Guorui Feng","doi":"10.1016/j.image.2025.117309","DOIUrl":"10.1016/j.image.2025.117309","url":null,"abstract":"<div><div>A large number of paintings are digitized, the automatic recognition and retrieval of artistic image styles become very meaningful. Because there is no standard definition and quantitative description of characteristics of artistic style, the representation of style is still a difficult problem. Recently, some work have used deep correlation features in neural style transfer to describe the texture characteristics of paintings and have achieved exciting results. Inspired by this, this paper proposes a multimodal style aggregation network that incorporates three modalities of texture, structure and color information of artistic images. Specifically, the group-wise Gram aggregation model is proposed to capture multi-level texture styles. The global average pooling (GAP) and histogram operation are employed to perform distillation of the high-level structural style and the low-level color style, respectively. Moreover, an improved deep correlation feature calculation method called learnable Gram (L-Gram) is proposed to enhance the ability to express style. Experiments show that our method outperforms several state-of-the-art methods in five style datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117309"},"PeriodicalIF":3.4,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-shot image generation based on meta-learning and generative adversarial network","authors":"Bowen Gu, Junhai Zhai","doi":"10.1016/j.image.2025.117307","DOIUrl":"10.1016/j.image.2025.117307","url":null,"abstract":"<div><div>Generative adversarial network (GAN) learns the latent distribution of samples through the adversarial training between discriminator and generator, then uses the learned probability distribution to generate realistic samples. Training a vanilla GAN requires a large number of samples and a significant amount of time. However, in practical applications, obtaining a large dataset and dedicating extensive time to model training can be very costly. Training a GAN with a small number of samples to generate high-quality images is a pressing research problem. Although this area has seen limited exploration, FAML (Fast Adaptive Meta-Learning) stands out as a notable approach. However, FAML has the following shortcomings: (1) The training time on complex datasets, such as VGGFaces and MiniImageNet, is excessively long. (2) It exhibits poor generalization performance and produces low-quality images across different datasets. (3) The generated samples lack diversity. To address the three shortcomings, we improved FAML in two key areas: model structure and loss function. The improved model effectively overcomes all three limitations of FAML. We conducted extensive experiments on four datasets to compare our model with the baseline FAML across seven evaluation metrics. The results demonstrate that our model is both more efficient and effective, particularly on the two complex datasets, VGGFaces and MiniImageNet. Our model outperforms FAML on six of the seven evaluation metrics, with only a slight underperformance on one metric. Our code is available at <span><span>https://github.com/BTGWS/FSML-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117307"},"PeriodicalIF":3.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OTPL: A novel measurement method of structural parallelism based on orientation transformation and geometric constraints","authors":"Weili Ding , Zhiyu Wang , Shuo Hu","doi":"10.1016/j.image.2025.117310","DOIUrl":"10.1016/j.image.2025.117310","url":null,"abstract":"<div><div>Detecting parallel geometric structures from images is a significant step for computer vision tasks. In this paper, an algorithm called Orientation Transformation-based Parallelism Measurement (OTPL) is proposed in this paper to measure the parallelism of structures including both line structures and curve structures. The task is decomposed into measurements of parallel straight line and parallel curve structures due to the inherent geometric differences between them, where the parallelism between curve structures can be further transformed into a matching problem. For parallel straight lines, the angle constraints and the rate of overlapping projection are considered as the parallel relationship selection rules for the candidate lines. For the parallel curves, the approximate vertical growing (AVG) algorithm is proposed to accelerate the search of adjacent curves and each smooth curve is coded as a vector with different angle values. The matching pairs are extracted through cosine similarity transformation and convexity consistency. Finally, the parallel curves are extracted by a decision-making process. The proposed algorithm is evaluated in a comprehensive manner, encompassing both qualitative and quantitative approaches, with the objective of achieving a more robust assessment. The results demonstrate the algorithm's efficacy in identifying parallel structures in both synthetic and natural images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117310"},"PeriodicalIF":3.4,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bidirectional interactive multi-scale network using Wave-Conv ViT for single image deraining","authors":"Siyan Fang, Bin Liu","doi":"10.1016/j.image.2025.117311","DOIUrl":"10.1016/j.image.2025.117311","url":null,"abstract":"<div><div>To address the limitations of high-frequency information capture by Vision Transformer (ViT) and the loss of fine details in existing image deraining methods, we introduce a Bidirectional Interactive Multi-Scale Network (BIMNet) that employs newly developed Wave-Conv ViT (WCV). The WCV utilizes a wavelet transform to enable self-attention in both low-frequency and high-frequency domains, significantly enhancing ViT's capacity for diverse frequency-domain feature modeling. Additionally, by incorporating convolutional operations, WCV enhances the extraction and integration of local features across various spatial windows. BIMNet injects rainy images into deep network layers, enabling bidirectional propagation with shallow layer features that enrich skip connections with detailed and complementary information, thus improving the fidelity of detail recovery. Moreover, we present the CORain1000 dataset, tailored for the dual challenges of image deraining and object detection, which offers more diversity in rain patterns, image sizes, and volumes than the commonly used COCO350 dataset. Extensive experiments demonstrate the superiority of BIMNet over advanced methods. The code and CORain1000 dataset are available at <span><span>https://github.com/fashyon/BIMNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117311"},"PeriodicalIF":3.4,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Feature Extraction and Knowledge Distillation Based Deep Learning Model for Human Activity Recognition System","authors":"Hetal Shah , Mehfuza S. Holia","doi":"10.1016/j.image.2025.117308","DOIUrl":"10.1016/j.image.2025.117308","url":null,"abstract":"<div><div>This article introduces the Generative Adversarial Network (GAN) framework model, which uses offline knowledge distillation (KD) to move spatio-deep data from a large teacher to a smaller student model. To achieve this, the teacher model named EfficientNetB7 embedded with spatial attention (E2SA) and a multi-layer Gated Recurrent Unit (GRU) is used. A hybrid feature extraction method known as Completed Hybrid Local Binary Pattern (ChLBP) is employed prior to the prediction process. After feature extraction, the hybrid features are parallelly given as input to both teacher and student models. In the teacher model, E2SA extracts both deep and spatio attention activity features, and these features are then input to the multi-layer GRU, which learns the human activity frame sequences overall. The proposed model obtains 98.50 % recognition accuracy on the UCF101 dataset and 79.21 % recognition accuracy on the HMDB51 dataset, which is considerably better than the existing models.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117308"},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143826393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingzheng Wang, Bin Li, Ge Shi, Xinyu Wang, Yiliang Chen
{"title":"Domain-guided multi-frequency underwater image enhancement network","authors":"Qingzheng Wang, Bin Li, Ge Shi, Xinyu Wang, Yiliang Chen","doi":"10.1016/j.image.2025.117281","DOIUrl":"10.1016/j.image.2025.117281","url":null,"abstract":"<div><div>The distribution of underwater images exhibits diverse due to the varied scattering and absorption of light in different water types. However, most existing methods have significant limitations as they cannot distinguish the difference between different water types during enhancement processing, and do not propose clear solutions for the different frequency information. Therefore, the key challenge is to achieve consistency between learned features and water types while preserving multi-frequency information. Thus, we propose a domain-guided multi-frequency underwater image enhancement network (DGMF), which generate high quality images by learning water-type-related features and capturing multi-frequency information. Specifically, we introduce a domain-aware module equipped with a water type classifier, which can distinguish the impacts of different water types, and guide the update of the model towards the specific domain. In addition, we design a multi-frequency mixer that couples Multi-Group Convolution (MGC) and Global Sparse Attention (GSA) to more effectively captures local and global information. Extensive experiments demonstrate that our method outperforms most state-of-the-art methods in both visual perception and evaluation metrics. The code is publicly available at <span><span>https://github.com/liyoucai699/DGMF.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117281"},"PeriodicalIF":3.4,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143686524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junjie Li , Si Guo , Shi Yi , Runhua He , Yong Jia
{"title":"DMCTDet: A density map-guided composite transformer network for object detection of UAV images","authors":"Junjie Li , Si Guo , Shi Yi , Runhua He , Yong Jia","doi":"10.1016/j.image.2025.117284","DOIUrl":"10.1016/j.image.2025.117284","url":null,"abstract":"<div><div>The application of unmanned aerial vehicles (UAVs) in urban scene object detection is a vital area of research in urban planning, intelligent monitoring, disaster prevention, and urban surveillance.e However, detecting objects in urban scenes captured by UAVs is a challenging task mainly due to the small size of the objects, the variability within the same class, and the diversity of objects. To design an object detection network that can be applied to complex urban scenes, this study proposes a novel composite transformer object detection network guided by a density map (DMCTDet) for urban scene detection in UAV images. The distributional a priori information of objects can be fully exploited by density maps. In the detection stage, a composite backbone feature extraction network is constructed by Swin Transformer combined with Vision Longformer, which can fully extract the scale-variation objects. Adaptive multiscale feature pyramid enhancement modules (AMFPEM) are inserted in the feature fusion stage between both Swin Transformer and Vision Longformer to learn the relationship between object scale variation and enhance the feature representation capacity of small objects. In this way, the accuracy of urban scene detection is significantly improved, and weak aggregated objects are successfully detected from UAV images. Extensive ablation experiments and comparison experiments. of the proposed network are conducted on publicly available urban scene detection datasets of UAV images. The experimental results demonstrate the effectiveness of the designed network structure and the superiority of the proposed network compared to state-of-the-art methods in terms of detection accuracy.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"136 ","pages":"Article 117284"},"PeriodicalIF":3.4,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}