{"title":"Learning Gabor layer for edge detection network","authors":"Haihua Ding , Sihan Huang , Chuan Lin","doi":"10.1016/j.dsp.2025.105438","DOIUrl":"10.1016/j.dsp.2025.105438","url":null,"abstract":"<div><div>In recent years, lightweight edge detection research has attracted much attention from scholars. This type of model has the advantages of low computational complexity and small parameter scale but sacrifices detection accuracy. The main reason is that the model does not fully extract the underlying and detailed features. We construct a learning Gabor layer as the front part of the edge detection network (“encoding network + decoding network”) to enhance the model's capability to extract low-level and detailed features. In addition, it can assist the network in extracting contextual semantic features and aggregating edges that continue in the same direction. The proposed learning Gabor layer only contains 150 parameters, and the total size of the model parameters is merely 0.49M. Applying the learning Gabor layer to our designed lightweight edge detection network (baseline) and validating it on four benchmark datasets (BSDS-VOC, NYUDv2, BIPED, Multicue) demonstrate the effectiveness of the learning Gabor layer.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105438"},"PeriodicalIF":2.9,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongdou He , Pei Miao , Yifang Huang , Peng Shi , Xiaobing Hao , Guoyan Huang , Bowen Zhao
{"title":"LSW-Net: A complex wilderness scenarios segmentation technique based on learnable spherical window multi-modal feature fusion","authors":"Hongdou He , Pei Miao , Yifang Huang , Peng Shi , Xiaobing Hao , Guoyan Huang , Bowen Zhao","doi":"10.1016/j.dsp.2025.105440","DOIUrl":"10.1016/j.dsp.2025.105440","url":null,"abstract":"<div><div>Wilderness scenarios are characterized by their unstructured and complex diversity, which makes segmentation in such environments more challenging. Current research mainly focuses on perception methods for structured environments (such as urban roads), with relatively less attention given to unstructured wilderness scenarios. Therefore, this paper investigates segmentation techniques for complex wilderness scenarios. Firstly, we propose a multi-channel point cloud mapping method specifically designed for wilderness environments, which extracts both geometric distribution features of terrain structures and ground texture characteristics directly from 3D point cloud data. Secondly, we propose a learnable spherical window mechanism for multi-modal feature fusion, enabling geometric-aware cross-modal interaction. By constructing point cloud spherical windows, the most relevant image context features to the point cloud features are filtered out, enabling the registration and fusion of complementary multi-modal features. Finally, a multi-head fusion classifier is employed to achieve effective segmentation of complex wilderness scenarios under multi-modal data fusion. An experimental platform for ground unmanned intelligent agent perception technology research was built, and the proposed model was subjected to simulation and experimental analysis. The results show that the model has high segmentation accuracy, with mIoU precision reaching 82.01% in simulated environments and 79.30% in real environments, representing an improvement of 7.63% and 5.99% respectively over traditional methods. This model is suitable for segmentation tasks in complex wilderness scenarios, providing a new solution to enhance the perception capabilities of wilderness scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105440"},"PeriodicalIF":2.9,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiqiang Cui , Junwei Gao , Rongxin Xing , Wenkai Wu , Mingyang Li , Yanping Yang
{"title":"Fault diagnosis method for rolling bearing based on attention mechanism and BiTCN model","authors":"Jiqiang Cui , Junwei Gao , Rongxin Xing , Wenkai Wu , Mingyang Li , Yanping Yang","doi":"10.1016/j.dsp.2025.105454","DOIUrl":"10.1016/j.dsp.2025.105454","url":null,"abstract":"<div><div>Fault diagnosis of rolling bearings typically requires the extraction of a large number of features. However, due to the inherent limitations of unidirectional feature extraction, it is challenging to capture temporal dependencies in both directions for non-stationary signals, resulting in low fault recognition accuracy and poor model adaptability to diverse datasets. To address this issue, we propose a fault diagnosis model based on attention mechanisms and a bidirectional temporal convolutional network (BiTCN). Firstly, the vibration signal is preprocessed. Then, the processed signal is fed into Squeeze-and-Excitation Networks (SENet) to select diagnostically relevant features, reducing computational load. Next, the BiTCN processes the selected features to extract bidirectional temporal dependencies from vibration signals, which unlike unidirectional TCN models. The multi-head attention mechanism (MA) dynamically reallocates weights to these features, which are then classified by a fully connected layer for fault diagnosis. The bearing fault datasets from Jiangnan University and Case Western Reserve University validate the fault diagnosis performance of our method. Experimental results show that the accuracy of the proposed model on the bearing fault dataset from Jiangnan University is 99.49%, and the accuracy of the model on the Case Western Reserve University dataset can reach 99%. These results demonstrate that the proposed model exhibits excellent bearing fault diagnosis performance, meets the requirements for fault diagnosis, and provides a novel approach for bearing fault diagnosis.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105454"},"PeriodicalIF":2.9,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cross-domain embedding cost learning joint FFT for security steganography","authors":"Tao Wang , Huashu Zhan , Meng Li","doi":"10.1016/j.dsp.2025.105430","DOIUrl":"10.1016/j.dsp.2025.105430","url":null,"abstract":"<div><div>Recent advancements in image steganography demonstrate that reasonable probability maps generated by minimum embedding cost learning through adversarial training can effectively improve the security performance of steganography. Existing embedding cost learning based steganography methods primarily rely on the generator to extract structural features in the image spatial domain, neglecting high frequency information in the frequency domain, which restricts the performance of the model. To address this gap, we propose a minimum embedding cost learning network based on a cross-domain feature fusion, not only extracting the spatial domain information, but also identifying the features in frequency information, aiming to generate effective probability maps for steganography. To this end, we first design an F-UNet architecture that obtains high-frequency features by training complex parameters in the frequency domain of FFT-processed input images. And then, we present an S-UNet by integrating a spatial attention mechanism into the UNet architecture to enhance its capability of extracting spatial domain information from images. Finally, we propose a feature fusion module to integrate cross domain information, allowing for the extraction of richer and more comprehensive features. In this way, we can efficiently model a cross-domain embedding cost learning network at both spatial and frequency scales, enhancing its ability to resist steganalysis and resulting in more secure and robust steganography. Experimental results demonstrate that the proposed method exceeds current methods in steganalysis resistance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105430"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A dynamic cross-modal learning framework for joint text-to-audio grounding and acoustic scene classification in smart city environments","authors":"Yige Zhang, Menglong Wu, Xichang Cai","doi":"10.1016/j.dsp.2025.105444","DOIUrl":"10.1016/j.dsp.2025.105444","url":null,"abstract":"<div><div>As two fundamental components of smart city acoustic perception frameworks, Text-to-Audio Grounding (TAG) and Acoustic Scene Classification (ASC) demonstrate essential capabilities in enabling robust environmental monitoring and anomaly detection. However, existing methods typically treat these tasks independently, leading to increased system complexity and overlooking potential synergies between tasks. Although there has been progress in multi-task joint learning research, these methods are primarily limited to single audio modality and predefined event category libraries, lacking the ability to utilize multimodal information and struggling to meet the diversity requirements of complex acoustic scenes in open environments. This paper presents the first multimodal joint learning framework that integrates TAG with ASC, effectively addressing three significant challenges: cross-modal feature heterogeneity, global-local objective conflicts, and modal-task feature coupling, thereby achieving deep task collaboration. The core contributions of this work include designing an Adaptive Transformer with Scene-aware Fusion (ATSF) that optimizes audio-text cross-modal interaction through dual-modal feature decoupling and scene-adaptive recombination mechanisms; constructing a Multimodal Progressive Layered Expert Network (PLE) that suppresses negative transfer in multi-task learning through task-specific and shared knowledge separation strategies; and proposing a dynamic gradient-balanced joint optimization strategy to support efficient cross-modal multi-objective training. Experiments on the extended AudioGrounding dataset demonstrate that our framework significantly improves performance compared to single-task baseline models, with TAG task PSDS value increasing from 14.7 % to 36.83 % and ASC classification accuracy reaching 79.46 %. The proposed ATSF-PLE framework provides an efficient and precise solution for intelligent urban acoustic perception systems, demonstrating substantial application value in intelligent security, traffic management, and other scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105444"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang , Ning Zhang , Yanping Zhao , Zhiyuan Feng , Baohua Yao
{"title":"Performance limit and phase design for near-field tracking with reconfigurable intelligent surface","authors":"Bo Wang , Ning Zhang , Yanping Zhao , Zhiyuan Feng , Baohua Yao","doi":"10.1016/j.dsp.2025.105426","DOIUrl":"10.1016/j.dsp.2025.105426","url":null,"abstract":"<div><div>Localization and tracking technique is one of the key technologies in the field of signal processing. Traditional methods use range-based techniques for target localization and tracking, but the algorithm's accuracy can degrade or even fail when the line of sight (LOS) link is obstructed. Reconfigurable intelligent surface (RIS), as a low-cost and flexibly deployable hardware material, can reflect incoming signals in a directional manner, which provides additional virtual line of sight (VLOS) links to enhance the accuracy of target localization and tracking. In this paper, we investigate the near-field target tracking problem with RIS, while analyzing the performance limit for the scenario and designing the phase of the RIS. Specifically, we establish a scene of near-field RIS-assisted tracking system and derive the posterior Cramér-Rao lower bound (PCRLB) as the tracking performance metric. By handling the Fisher information matrix (FIM), we illustrate that RIS-assisted target tracking system can effectively reduce the blind spot range of target velocity estimation compared to the traditional antenna array. Furthermore, we formulate an optimization problem by minimizing PCRLB to seek the optimal phase design under two scenes. For the prior target scenario, we process the original problem as a semidefinite program (SDP) problem by releasing it. In the unknown target scenario, we process the area of interest into an uncertain set and ultimately solve the problem through robust alternating optimization. Finally, the simulation experiments prove the effectiveness of the RIS-assisted target tracking algorithm.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105426"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Built-in self-scaling method for kernel-based estimation in the presence of nonlinear distortion","authors":"Ai Hui Tan","doi":"10.1016/j.dsp.2025.105452","DOIUrl":"10.1016/j.dsp.2025.105452","url":null,"abstract":"<div><div>This paper proposes the use of perturbation signals with harmonic suppression in combination with prior steady-state gain for impulse response estimation of linear systems corrupted with nonlinear distortion. The proposed method allows the effects of nonlinear distortion on the linear estimate to be eliminated or reduced and enables the prior information to be incorporated into the estimation by a direct extension of the standard kernel-based (KB) formulation into the built-in self-scaling (BS) method. Theoretical derivation proves that the BS method can preserve the property of harmonic suppression in perturbation signals. The bias and variance in the impulse response estimate are derived theoretically and analyzed in detail. The findings confirmed that the proposed approach leads to high estimation accuracy and low uncertainty, without increasing computational complexity or measurement time. Furthermore, the method can readily extend to multi-input multi-output systems. The feasibility of the proposed technique is illustrated through a real experiment on an electronic nose, where the response is important in the food industry process automation for increasing both efficiency and reliability of distinguishing volatile compounds. The proposed approach was shown to be superior to both the standard KB estimation and a competing method utilizing information on the prior steady-state gain.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105452"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxing Xia , Aoqi Zhang , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Xingzhu Liang , Yan Zhang
{"title":"GRdepth: Enrich feature with global information and self-iterative regulation network for monocular depth estimation","authors":"Chenxing Xia , Aoqi Zhang , Xiuju Gao , Bin Ge , Kuan-Ching Li , Xianjin Fang , Xingzhu Liang , Yan Zhang","doi":"10.1016/j.dsp.2025.105434","DOIUrl":"10.1016/j.dsp.2025.105434","url":null,"abstract":"<div><div>Monocular depth estimation (MDE) seeks to infer pixel-wise dense depth maps from a single RGB image. Recent methodologies predominantly utilize the encoder-decoder architecture to effectively extract and analyze multi-scale features. However, they tend to ignore the important role that high-level features with rich global information play in MDE, resulting in a poor understanding of the overall structure of the scene by the model. Based on this, we propose a novel encoder-decoder framework called GRdepth, which includes a cross large scale feature enhancement (CLSE) module and an iterative regulation decoder (IRD). Specifically, the CLSE module is designed to use high-level features, enriched with global information extracted by a global information aggregation (GIA) unit, to guide the enhancement of multi-scale feature maps produced by the encoder. This enhancement is achieved through a cross large scale feature fusion (CLSF) unit built from channel attention and spatial attention to refine low-level features with high-level information. The IRD is tailored for MDE based on classification-regression which mainly utilizes a bin width self-regulation (SRbins) unit to adjust the width of the initial bins predicted with the bottleneck features. This adjustment is guided by bin width predicted by an iterative adaptive feature fusion (IAFF) unit at each level, effectively combining global information and local information for more accurate bin width and bin centers. Extensive experiments on the indoor dataset NYU-Depth-v2 and SUN-RGBD and on the outdoor dataset KITTI demonstrate that our method can achieve comparable state-of-the-art (SOTA) results.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105434"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACENet: Adaptive correlation-enhanced network for multivariate time series forecasting","authors":"Yupeng Wu , Muzhou Hou , Haokun Hu","doi":"10.1016/j.dsp.2025.105424","DOIUrl":"10.1016/j.dsp.2025.105424","url":null,"abstract":"<div><div>A multitude of practical applications necessitate the utilization of multivariate time series forecasting techniques, including the issuance of extreme weather warnings and the formulation of energy consumption plans. However, time series data frequently display intricate intra- and inter-series correlations, rendering modelling and forecasting particularly challenging due to these complex dependencies. The comprehension and representation of these multi-level interactions represent a fundamental research challenge, one that is also of paramount importance in numerous application domains. The extant literature has a restricted focus on capturing correlations within periodic time intervals at disparate time scales and between these intervals. To address these challenges, we propose the Adaptive Correlation-Enhanced Network (ACENet). The model begins by extracting multiple significant period lengths through Fast Fourier Transform (FFT) and segmenting the time series accordingly. At each temporal scale, three dedicated correlation matrices - capturing feature-wise correlations within periods, timestamp-wise correlations within periods, and cross-period correlations respectively - work in concert to enhance periodic pattern learning. The framework then employs an adaptive weighting mechanism to dynamically balance intra-period and inter-period correlations, ultimately generating the final prediction through this hierarchical integration of multi-scale temporal dependencies. Finally, experiments on several real-world datasets demonstrate the effectiveness of ACENet on MST datasets.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105424"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144549078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel color image encryption algorithm based on infinite collapse map and hierarchical strategy","authors":"Yonghui Huang, Qilin Zhang, Yongbiao Zhao","doi":"10.1016/j.dsp.2025.105428","DOIUrl":"10.1016/j.dsp.2025.105428","url":null,"abstract":"<div><div>Chaos-based image encryption algorithms are important for information security, but current chaotic systems and encryption algorithms still have optimization potential. This paper proposes a novel one-dimensional improved composite chaotic map (1D-ICCM) to enhance chaos performance and the efficiency of generating chaotic sequences. The dynamic characteristics of the 1D-ICCM are analyzed in depth, demonstrating favorable chaotic properties. Based on this, we introduce a hierarchical strategy image encryption algorithm (HS-IEA) that combines 1D-ICCM and the Logistic map to improve the robustness and security of image encryption algorithms. The algorithm begins by integrating the image at the pixel level and performing secondary diffusion on the integrated sequence. After restoring the color channel matrices, the image is encrypted at the pixel level using bidirectional dynamic scrambling. Then, bit-plane decomposition is applied. To handle large amounts of data at the bit level, high-order and low-order bit planes are processed separately: high-order planes are encrypted using bit-level diffusion, while low-order planes use bit-plane rotation. Finally, the encrypted bit-planes and color channel matrices are merged for two-layer encryption. Experimental results and security evaluations confirm that the HS-IEA significantly improves image encryption's robustness, security, and performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"167 ","pages":"Article 105428"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}