Jiale He , Qunbing Xia , Gaobo Yang , Xiangling Ding
{"title":"Higher-order motion calibration and sparsity based outlier correction for video FRUC","authors":"Jiale He , Qunbing Xia , Gaobo Yang , Xiangling Ding","doi":"10.1016/j.image.2025.117327","DOIUrl":"10.1016/j.image.2025.117327","url":null,"abstract":"<div><div>For frame rate up-conversion (FRUC), one of the key challenges is to deal with irregular and large motions that are widely existed in video scenes. However, most existing FRUC works make constant brightness and linear motion assumptions, easily leading to undesirable artifacts such as motion blurriness and frame flickering. In this work, we propose an advanced FRUC work by using a high-order model for motion calibration and a sparse sampling strategy for outlier correction. Unidirectional motion estimation is used to accurately locate object from the previous frame to the following frame in a coarse-to-fine pyramid structure. Then, object motion trajectory is fine-tuned to approximate real motion, and possible outlier regions are located and recorded. Moreover, image sparsity is exploited as the prior knowledge for outlier correction, and the outlier index map is used to design the measurement matrix. Based on the theory of sparse sampling, the outlier regions are reconstructed to eliminate the side effects such as overlapping, holes and blurring. Extensive experimental results demonstrate that the proposed approach outperforms the state-of-the-art FRUC works in terms of both objective and subjective qualities of interpolated frames.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117327"},"PeriodicalIF":3.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Zhu, Linxi Li, Mingwei Tang, Wenrui Niu, Jianhua Xie, Hongyun Mao
{"title":"FANet: Feature attention network for semantic segmentation","authors":"Lin Zhu, Linxi Li, Mingwei Tang, Wenrui Niu, Jianhua Xie, Hongyun Mao","doi":"10.1016/j.image.2025.117330","DOIUrl":"10.1016/j.image.2025.117330","url":null,"abstract":"<div><div>Semantic segmentation based on scene parsing specifies a category label for each pixel in the image. Existing neural network models are useful tools for understanding the objects in the scene. However, they ignore the heterogeneity of information carried by individual features, leading to pixel classification confusion and unclear boundaries. Therefore, this paper proposes a novel Feature Attention Network (FANet). Firstly, the adjustment algorithm is presented to capture attention feature matrices that can effectively cherry-pick feature dependencies. Secondly, the hybrid extraction module (HEM) is constructed to aggregate long-term dependencies based on proposed adjustment algorithm. Finally, the proposed adaptive hierarchical fusion module (AHFM) is employed to aggregated multi-scale features by learning spatially filtering conflictive information, which improves the scale invariance of features. Experimental results on popular Benchmarks (such as PASCAL VOC 2012, Cityscapes and ADE20K) indicate that our algorithm achieves better performance than other algorithms.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117330"},"PeriodicalIF":3.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen Tan , Youneng Bao , Fanyang Meng , Chao Li , Yongsheng Liang
{"title":"Adaptive cross-channel transformation based on self-modulation for learned image compression","authors":"Wen Tan , Youneng Bao , Fanyang Meng , Chao Li , Yongsheng Liang","doi":"10.1016/j.image.2025.117325","DOIUrl":"10.1016/j.image.2025.117325","url":null,"abstract":"<div><div>Recently learned image compression has achieved excellent rate–distortion performance, and nonlinear transformation becomes a critical component for performance improvement. While Generalized Divisible Normalization (GDN) is a widely used method that exploits channel correlation for effective nonlinear representation, its utilization of cross-channel relationship for each element of features remains limited. In this paper, we propose a novel cross-channel transformation based on self-modulation, named SMCCT. The SMCCT takes the intermediate feature maps as input to capture cross-channel correlation and generate affine transformation parameters for element-wise feature modulation. The proposed transformation enables adaptive weighting and fine-grained control over the features, which helps to learn expressive features and further reduce redundancies. The SMCCT can be flexibly employed into learned image compression models. Experimental results demonstrate that the proposed method can achieve superior rate–distortion performance with the existing learned image compression methods and outperform traditional codecs under the quality metric such as PSNR and MS-SSIM. Specifically, when using the PSNR metric, our proposed method outperforms latest codec VTM-12.1 by 5.47%, 10.25% in BD-rate on Kodak and Tecnick datasets. When using the MS-SSIM metric, it outperforms latest codec VTM-12.1 by 50.97%, 49.81% in BD-rate on Kodak and Tecnick datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117325"},"PeriodicalIF":3.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical contrastive learning for unsupervised 3D action recognition","authors":"Haoyuan Zhang , Qingquan Li","doi":"10.1016/j.image.2025.117329","DOIUrl":"10.1016/j.image.2025.117329","url":null,"abstract":"<div><div>Unsupervised contrastive 3D action representation learning has made great progress recently. However, most works rely on only the direct instance-level comparison with unreasonable positive/negative constraint, which degrades the learning performance. In this paper, we propose a Hierarchical Contrastive Scheme (HCS) for unsupervised skeleton 3D action representation learning, which takes advantage of multi-level contrast. Specifically, we keep the instance-level contrast to draw the different augmentations of the same instance close, targets to learn intra-instance consistency. Then we extend the contrastive objective from individual instances to clusters by enforcing consistency between cluster assignment from different instance of same category, aims at learning inter-instance consistency. Compared with previous methods, HCS enables intra/inter-instance consistency pursuit via multi-level contrast, without inflexible positive/negative constraint, which leads to a more discriminative feature space. Experimental results validate that the proposed framework outperforms the previous state-of-the-art methods on the challenging NTU RGB+D and PKU-MMD datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117329"},"PeriodicalIF":3.4,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camera calibration using property of asymptotes with application to sports scenes","authors":"Fengli Yang, Xuechun Wang, Yue Zhao","doi":"10.1016/j.image.2025.117331","DOIUrl":"10.1016/j.image.2025.117331","url":null,"abstract":"<div><div>Inspired by Ying's work on the calibration technique, this study proposes a new planar pattern (referred to as the phi-type model hereinafter), which includes a circle and diameter, as the calibration scene. In sports scenarios, such as a soccer match or basketball court, most existing methods require information of the scene points in a three-dimensional space. However, an interesting observation in the midfield is that the centre circle and the halfway line form a phi-type template. A new automatic method using the properties of asymptotes is proposed based on the images of the midfield. All intrinsic parameters of the camera can be determined without any assumptions such as zero skew or unitary aspect ratio. The main advantages of our technique are that it neither involves point or line matching nor does it require the metric information of the model plane. The feasibility and validity of the proposed algorithm were verified by testing the noise sensitivity and performing image metric rectification.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"138 ","pages":"Article 117331"},"PeriodicalIF":3.4,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zijie Guo , Heng Wu , Shaojuan Luo , Genping Zhao , Chunhua He , Tao Wang
{"title":"Hidden dangerous object detection for terahertz body security check images based on adaptive multi-scale decomposition convolution","authors":"Zijie Guo , Heng Wu , Shaojuan Luo , Genping Zhao , Chunhua He , Tao Wang","doi":"10.1016/j.image.2025.117323","DOIUrl":"10.1016/j.image.2025.117323","url":null,"abstract":"<div><div>Recently, detecting hidden dangerous objects with the terahertz technique has attracted extensive attention. Many convolutional neural network-based object detection methods can achieve excellent results in common object detection. However, the existing object detection methods generally have low detection accuracy and large model parameter issues for hidden dangerous objects in terahertz body security check images due to the blurring and poor quality of terahertz images and ignoring the global context information. To address these issues, we propose an enhanced You Only Look Once network (YOLO-AMDC), which is integrated with an adaptive multi-scale large-kernel decomposition convolution (AMDC) module. Specifically, we design an AMDC module to enhance the feature expression ability of the YOLO framework. Moreover, we develop the Bi-Level Routing Attention (BRA) mechanism and a simple parameter-free attention module (SimAM) to integrate and utilize contextual information to improve the performance of dangerous object detection. Additionally, we adopt a model pruning approach to reduce the number of model parameters. The experimental results show that YOLO-AMDC outperforms other state-of-the-art methods. Compared with YOLOv8s, YOLO-AMDC reduces the parameters by 3.9 M and improves mAP@50 by 5 %. The detection performance is still competitive when the number of parameters is significantly reduced by model pruning.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117323"},"PeriodicalIF":3.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Delin Zhang , Guangxiang Yang , Guangling Yu , Baofeng Yang , Xiaoheng Wang
{"title":"ORB-SLAM3 and dense mapping algorithm based on improved feature matching","authors":"Delin Zhang , Guangxiang Yang , Guangling Yu , Baofeng Yang , Xiaoheng Wang","doi":"10.1016/j.image.2025.117322","DOIUrl":"10.1016/j.image.2025.117322","url":null,"abstract":"<div><div>ORB-SLAM3 is currently the mainstream visual SLAM system, which uses feature matching based on ORB keypoints. However, ORB-SLAM3 faces two main issues: Firstly, feature matching is time-consuming, and the insufficient number of feature point matches results in lower algorithmic localization accuracy. Secondly, it lacks the capability to construct dense point cloud maps, therefore limiting its applicability in high-demand scenarios such as path planning. To address these issues, this paper proposes an ORB-SLAM3 and dense mapping algorithm based on improved feature matching. In the feature matching process of ORB-SLAM3, motion smoothness constraints are introduced and the image is gridded. The feature points that are at the edge of the grid are divided into multiple adjacent grids to solve the problems, which are unable to correctly partition the feature points to the corresponding grid and algorithm time consumption. This reduces matched time and increases the number of matched pairs, improving the positioning accuracy of ORB-SLAM3. Moreover, a dense mapping construction thread has been added to construct dense point cloud maps in real-time using keyframes and corresponding poses filtered from the feature matching stage. Finally, simulation experiments were conducted using the TUM dataset for validation. The results demonstrate that the improved algorithm reduced feature matching time by 75.71 % compared to ORB-SLAM3, increased the number of feature point matches by 88.69 %, and improved localization accuracy by 9.44 %. Furthermore, the validation confirmed that the improved algorithm is capable of constructing dense maps in real-time. In conclusion, the improved algorithm demonstrates excellent performance in terms of localization accuracy and dense mapping.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117322"},"PeriodicalIF":3.4,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camouflaged instance segmentation based on multi-scale feature contour fusion swin transformer","authors":"Yin-Fu Huang, Feng-Yen Jen","doi":"10.1016/j.image.2025.117328","DOIUrl":"10.1016/j.image.2025.117328","url":null,"abstract":"<div><div>Camouflaged instance segmentation is the latest detection issue for finding hidden objects in an image. Since camouflaged objects hide with similar background colors, it is difficult to detect objects' existence. In this paper, we proposed an instance segmentation model called Multi-scale Feature Contour Fusion Swin Transformer (MFCFSwinT) consisting of seven modules; i.e., Swin Transformer as the backbone for feature extraction, Pyramid of Kernel with Dilation (PKD) and Multi-Feature Fusion (MFF) for multi-scale features, Contour Branch and Contour Feature Fusion (CFF) for feature fusion, and Region Proposal Network (RPN) and Cascade Head for bounding boxes and masks detection. In the experiments, four datasets are used to evaluate the proposed model; i.e., COCO (Common Objects in Context), LVIS v1.0 (Large Vocabulary Instance Segmentation), COD10K (Camouflaged Object Detection), and NC4K. Finally, the experimental results show that MFCFSwinT can achieve better performances than most state-of-the-art models.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117328"},"PeriodicalIF":3.4,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143826394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenbo Zhang, Lulu Pan, Ke Xu, Guo Li, Yanheng Lv, Lingxiao Li, Le Lei
{"title":"Self-adaptive and learnable detail enhancement network for efficient image super resolution","authors":"Wenbo Zhang, Lulu Pan, Ke Xu, Guo Li, Yanheng Lv, Lingxiao Li, Le Lei","doi":"10.1016/j.image.2025.117319","DOIUrl":"10.1016/j.image.2025.117319","url":null,"abstract":"<div><div>In recent years, single image super-resolution (SISR) methods based on deep learning have advanced significantly. However, their high computational complexity and memory demands hinder deployment on resource-constrained devices. Although numerous lightweight super-resolution (SR) methods have been proposed to address this issue, most fail to distinguish between flat and detailed regions in images, treating them uniformly. This lack of targeted design for detailed regions, which are critical to SR performance, results in redundancy and inefficiency in existing lightweight methods. To address these challenges, we propose a simple yet effective network Self-adaptive and Learnable Detail Enhancement Network (LDEN) that specifically focuses on the reconstruction of detailed regions. Firstly, we present two designs for the reconstruction of detailed regions: (1) we design the Learnable Detail Extraction Block (LDEB), which can pay special attention to detailed regions and employ a larger convolution kernel in LDEB to obtain a larger receptive field; (2) we design a lightweight attention mechanism called Detail-oriented Spatial Attention (DSA) to enhance the network's ability to reconstruct detailed regions. Secondly, we design a hierarchical refinement mechanism named Efficient Hierarchical Refinement Block (EHRB) which can reduce the inadequate information extraction and integration caused by rough single-layer refinement. Extensive experiments demonstrate that LDEN achieves state-of-the-art performance on all benchmark datasets. Notably, for 4 × magnification tasks, LDEN outperforms BSRN - the champion of the model complexity track of NTIRE 2022 Efficient SR Challenge - by achieving gains of 0.11 dB and 0.12 dB while reducing parameters by nearly 10 %.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117319"},"PeriodicalIF":3.4,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143817018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal style aggregation network for art image classification","authors":"Quan Wang, Guorui Feng","doi":"10.1016/j.image.2025.117309","DOIUrl":"10.1016/j.image.2025.117309","url":null,"abstract":"<div><div>A large number of paintings are digitized, the automatic recognition and retrieval of artistic image styles become very meaningful. Because there is no standard definition and quantitative description of characteristics of artistic style, the representation of style is still a difficult problem. Recently, some work have used deep correlation features in neural style transfer to describe the texture characteristics of paintings and have achieved exciting results. Inspired by this, this paper proposes a multimodal style aggregation network that incorporates three modalities of texture, structure and color information of artistic images. Specifically, the group-wise Gram aggregation model is proposed to capture multi-level texture styles. The global average pooling (GAP) and histogram operation are employed to perform distillation of the high-level structural style and the low-level color style, respectively. Moreover, an improved deep correlation feature calculation method called learnable Gram (L-Gram) is proposed to enhance the ability to express style. Experiments show that our method outperforms several state-of-the-art methods in five style datasets.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117309"},"PeriodicalIF":3.4,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143760469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}