{"title":"Camera calibration using property of asymptotes with application to sports scenes","authors":"Fengli Yang, Xuechun Wang, Yue Zhao","doi":"10.1016/j.image.2025.117331","DOIUrl":"10.1016/j.image.2025.117331","url":null,"abstract":"<div><div>Inspired by Ying's work on the calibration technique, this study proposes a new planar pattern (referred to as the phi-type model hereinafter), which includes a circle and diameter, as the calibration scene. In sports scenarios, such as a soccer match or basketball court, most existing methods require information of the scene points in a three-dimensional space. However, an interesting observation in the midfield is that the centre circle and the halfway line form a phi-type template. A new automatic method using the properties of asymptotes is proposed based on the images of the midfield. All intrinsic parameters of the camera can be determined without any assumptions such as zero skew or unitary aspect ratio. The main advantages of our technique are that it neither involves point or line matching nor does it require the metric information of the model plane. The feasibility and validity of the proposed algorithm were verified by testing the noise sensitivity and performing image metric rectification.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117331"},"PeriodicalIF":3.4,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zijie Guo , Heng Wu , Shaojuan Luo , Genping Zhao , Chunhua He , Tao Wang
{"title":"Hidden dangerous object detection for terahertz body security check images based on adaptive multi-scale decomposition convolution","authors":"Zijie Guo , Heng Wu , Shaojuan Luo , Genping Zhao , Chunhua He , Tao Wang","doi":"10.1016/j.image.2025.117323","DOIUrl":"10.1016/j.image.2025.117323","url":null,"abstract":"<div><div>Recently, detecting hidden dangerous objects with the terahertz technique has attracted extensive attention. Many convolutional neural network-based object detection methods can achieve excellent results in common object detection. However, the existing object detection methods generally have low detection accuracy and large model parameter issues for hidden dangerous objects in terahertz body security check images due to the blurring and poor quality of terahertz images and ignoring the global context information. To address these issues, we propose an enhanced You Only Look Once network (YOLO-AMDC), which is integrated with an adaptive multi-scale large-kernel decomposition convolution (AMDC) module. Specifically, we design an AMDC module to enhance the feature expression ability of the YOLO framework. Moreover, we develop the Bi-Level Routing Attention (BRA) mechanism and a simple parameter-free attention module (SimAM) to integrate and utilize contextual information to improve the performance of dangerous object detection. Additionally, we adopt a model pruning approach to reduce the number of model parameters. The experimental results show that YOLO-AMDC outperforms other state-of-the-art methods. Compared with YOLOv8s, YOLO-AMDC reduces the parameters by 3.9 M and improves mAP@50 by 5 %. The detection performance is still competitive when the number of parameters is significantly reduced by model pruning.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117323"},"PeriodicalIF":3.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Delin Zhang , Guangxiang Yang , Guangling Yu , Baofeng Yang , Xiaoheng Wang
{"title":"ORB-SLAM3 and dense mapping algorithm based on improved feature matching","authors":"Delin Zhang , Guangxiang Yang , Guangling Yu , Baofeng Yang , Xiaoheng Wang","doi":"10.1016/j.image.2025.117322","DOIUrl":"10.1016/j.image.2025.117322","url":null,"abstract":"<div><div>ORB-SLAM3 is currently the mainstream visual SLAM system, which uses feature matching based on ORB keypoints. However, ORB-SLAM3 faces two main issues: Firstly, feature matching is time-consuming, and the insufficient number of feature point matches results in lower algorithmic localization accuracy. Secondly, it lacks the capability to construct dense point cloud maps, therefore limiting its applicability in high-demand scenarios such as path planning. To address these issues, this paper proposes an ORB-SLAM3 and dense mapping algorithm based on improved feature matching. In the feature matching process of ORB-SLAM3, motion smoothness constraints are introduced and the image is gridded. The feature points that are at the edge of the grid are divided into multiple adjacent grids to solve the problems, which are unable to correctly partition the feature points to the corresponding grid and algorithm time consumption. This reduces matched time and increases the number of matched pairs, improving the positioning accuracy of ORB-SLAM3. Moreover, a dense mapping construction thread has been added to construct dense point cloud maps in real-time using keyframes and corresponding poses filtered from the feature matching stage. Finally, simulation experiments were conducted using the TUM dataset for validation. The results demonstrate that the improved algorithm reduced feature matching time by 75.71 % compared to ORB-SLAM3, increased the number of feature point matches by 88.69 %, and improved localization accuracy by 9.44 %. Furthermore, the validation confirmed that the improved algorithm is capable of constructing dense maps in real-time. In conclusion, the improved algorithm demonstrates excellent performance in terms of localization accuracy and dense mapping.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117322"},"PeriodicalIF":3.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camouflaged instance segmentation based on multi-scale feature contour fusion swin transformer","authors":"Yin-Fu Huang, Feng-Yen Jen","doi":"10.1016/j.image.2025.117328","DOIUrl":"10.1016/j.image.2025.117328","url":null,"abstract":"<div><div>Camouflaged instance segmentation is the latest detection issue for finding hidden objects in an image. Since camouflaged objects hide with similar background colors, it is difficult to detect objects' existence. In this paper, we proposed an instance segmentation model called Multi-scale Feature Contour Fusion Swin Transformer (MFCFSwinT) consisting of seven modules; i.e., Swin Transformer as the backbone for feature extraction, Pyramid of Kernel with Dilation (PKD) and Multi-Feature Fusion (MFF) for multi-scale features, Contour Branch and Contour Feature Fusion (CFF) for feature fusion, and Region Proposal Network (RPN) and Cascade Head for bounding boxes and masks detection. In the experiments, four datasets are used to evaluate the proposed model; i.e., COCO (Common Objects in Context), LVIS v1.0 (Large Vocabulary Instance Segmentation), COD10K (Camouflaged Object Detection), and NC4K. Finally, the experimental results show that MFCFSwinT can achieve better performances than most state-of-the-art models.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117328"},"PeriodicalIF":3.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143826394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbo Zhang, Lulu Pan, Ke Xu, Guo Li, Yanheng Lv, Lingxiao Li, Le Lei
{"title":"Self-adaptive and learnable detail enhancement network for efficient image super resolution","authors":"Wenbo Zhang, Lulu Pan, Ke Xu, Guo Li, Yanheng Lv, Lingxiao Li, Le Lei","doi":"10.1016/j.image.2025.117319","DOIUrl":"10.1016/j.image.2025.117319","url":null,"abstract":"<div><div>In recent years, single image super-resolution (SISR) methods based on deep learning have advanced significantly. However, their high computational complexity and memory demands hinder deployment on resource-constrained devices. Although numerous lightweight super-resolution (SR) methods have been proposed to address this issue, most fail to distinguish between flat and detailed regions in images, treating them uniformly. This lack of targeted design for detailed regions, which are critical to SR performance, results in redundancy and inefficiency in existing lightweight methods. To address these challenges, we propose a simple yet effective network Self-adaptive and Learnable Detail Enhancement Network (LDEN) that specifically focuses on the reconstruction of detailed regions. Firstly, we present two designs for the reconstruction of detailed regions: (1) we design the Learnable Detail Extraction Block (LDEB), which can pay special attention to detailed regions and employ a larger convolution kernel in LDEB to obtain a larger receptive field; (2) we design a lightweight attention mechanism called Detail-oriented Spatial Attention (DSA) to enhance the network's ability to reconstruct detailed regions. Secondly, we design a hierarchical refinement mechanism named Efficient Hierarchical Refinement Block (EHRB) which can reduce the inadequate information extraction and integration caused by rough single-layer refinement. Extensive experiments demonstrate that LDEN achieves state-of-the-art performance on all benchmark datasets. Notably, for 4 × magnification tasks, LDEN outperforms BSRN - the champion of the model complexity track of NTIRE 2022 Efficient SR Challenge - by achieving gains of 0.11 dB and 0.12 dB while reducing parameters by nearly 10 %.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117319"},"PeriodicalIF":3.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal style aggregation network for art image classification","authors":"Quan Wang, Guorui Feng","doi":"10.1016/j.image.2025.117309","DOIUrl":"10.1016/j.image.2025.117309","url":null,"abstract":"<div><div>A large number of paintings are digitized, the automatic recognition and retrieval of artistic image styles become very meaningful. Because there is no standard definition and quantitative description of characteristics of artistic style, the representation of style is still a difficult problem. Recently, some work have used deep correlation features in neural style transfer to describe the texture characteristics of paintings and have achieved exciting results. Inspired by this, this paper proposes a multimodal style aggregation network that incorporates three modalities of texture, structure and color information of artistic images. Specifically, the group-wise Gram aggregation model is proposed to capture multi-level texture styles. The global average pooling (GAP) and histogram operation are employed to perform distillation of the high-level structural style and the low-level color style, respectively. Moreover, an improved deep correlation feature calculation method called learnable Gram (L-Gram) is proposed to enhance the ability to express style. Experiments show that our method outperforms several state-of-the-art methods in five style datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117309"},"PeriodicalIF":3.4,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-shot image generation based on meta-learning and generative adversarial network","authors":"Bowen Gu, Junhai Zhai","doi":"10.1016/j.image.2025.117307","DOIUrl":"10.1016/j.image.2025.117307","url":null,"abstract":"<div><div>Generative adversarial network (GAN) learns the latent distribution of samples through the adversarial training between discriminator and generator, then uses the learned probability distribution to generate realistic samples. Training a vanilla GAN requires a large number of samples and a significant amount of time. However, in practical applications, obtaining a large dataset and dedicating extensive time to model training can be very costly. Training a GAN with a small number of samples to generate high-quality images is a pressing research problem. Although this area has seen limited exploration, FAML (Fast Adaptive Meta-Learning) stands out as a notable approach. However, FAML has the following shortcomings: (1) The training time on complex datasets, such as VGGFaces and MiniImageNet, is excessively long. (2) It exhibits poor generalization performance and produces low-quality images across different datasets. (3) The generated samples lack diversity. To address the three shortcomings, we improved FAML in two key areas: model structure and loss function. The improved model effectively overcomes all three limitations of FAML. We conducted extensive experiments on four datasets to compare our model with the baseline FAML across seven evaluation metrics. The results demonstrate that our model is both more efficient and effective, particularly on the two complex datasets, VGGFaces and MiniImageNet. Our model outperforms FAML on six of the seven evaluation metrics, with only a slight underperformance on one metric. Our code is available at <span><span>https://github.com/BTGWS/FSML-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117307"},"PeriodicalIF":3.4,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OTPL: A novel measurement method of structural parallelism based on orientation transformation and geometric constraints","authors":"Weili Ding , Zhiyu Wang , Shuo Hu","doi":"10.1016/j.image.2025.117310","DOIUrl":"10.1016/j.image.2025.117310","url":null,"abstract":"<div><div>Detecting parallel geometric structures from images is a significant step for computer vision tasks. In this paper, an algorithm called Orientation Transformation-based Parallelism Measurement (OTPL) is proposed in this paper to measure the parallelism of structures including both line structures and curve structures. The task is decomposed into measurements of parallel straight line and parallel curve structures due to the inherent geometric differences between them, where the parallelism between curve structures can be further transformed into a matching problem. For parallel straight lines, the angle constraints and the rate of overlapping projection are considered as the parallel relationship selection rules for the candidate lines. For the parallel curves, the approximate vertical growing (AVG) algorithm is proposed to accelerate the search of adjacent curves and each smooth curve is coded as a vector with different angle values. The matching pairs are extracted through cosine similarity transformation and convexity consistency. Finally, the parallel curves are extracted by a decision-making process. The proposed algorithm is evaluated in a comprehensive manner, encompassing both qualitative and quantitative approaches, with the objective of achieving a more robust assessment. The results demonstrate the algorithm's efficacy in identifying parallel structures in both synthetic and natural images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117310"},"PeriodicalIF":3.4,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bidirectional interactive multi-scale network using Wave-Conv ViT for single image deraining","authors":"Siyan Fang, Bin Liu","doi":"10.1016/j.image.2025.117311","DOIUrl":"10.1016/j.image.2025.117311","url":null,"abstract":"<div><div>To address the limitations of high-frequency information capture by Vision Transformer (ViT) and the loss of fine details in existing image deraining methods, we introduce a Bidirectional Interactive Multi-Scale Network (BIMNet) that employs newly developed Wave-Conv ViT (WCV). The WCV utilizes a wavelet transform to enable self-attention in both low-frequency and high-frequency domains, significantly enhancing ViT's capacity for diverse frequency-domain feature modeling. Additionally, by incorporating convolutional operations, WCV enhances the extraction and integration of local features across various spatial windows. BIMNet injects rainy images into deep network layers, enabling bidirectional propagation with shallow layer features that enrich skip connections with detailed and complementary information, thus improving the fidelity of detail recovery. Moreover, we present the CORain1000 dataset, tailored for the dual challenges of image deraining and object detection, which offers more diversity in rain patterns, image sizes, and volumes than the commonly used COCO350 dataset. Extensive experiments demonstrate the superiority of BIMNet over advanced methods. The code and CORain1000 dataset are available at <span><span>https://github.com/fashyon/BIMNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117311"},"PeriodicalIF":3.4,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143792164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Feature Extraction and Knowledge Distillation Based Deep Learning Model for Human Activity Recognition System","authors":"Hetal Shah , Mehfuza S. Holia","doi":"10.1016/j.image.2025.117308","DOIUrl":"10.1016/j.image.2025.117308","url":null,"abstract":"<div><div>This article introduces the Generative Adversarial Network (GAN) framework model, which uses offline knowledge distillation (KD) to move spatio-deep data from a large teacher to a smaller student model. To achieve this, the teacher model named EfficientNetB7 embedded with spatial attention (E2SA) and a multi-layer Gated Recurrent Unit (GRU) is used. A hybrid feature extraction method known as Completed Hybrid Local Binary Pattern (ChLBP) is employed prior to the prediction process. After feature extraction, the hybrid features are parallelly given as input to both teacher and student models. In the teacher model, E2SA extracts both deep and spatio attention activity features, and these features are then input to the multi-layer GRU, which learns the human activity frame sequences overall. The proposed model obtains 98.50 % recognition accuracy on the UCF101 dataset and 79.21 % recognition accuracy on the HMDB51 dataset, which is considerably better than the existing models.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117308"},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143826393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}