{"title":"X-Enhanced ULite: Improving semantic segmentation for surface defects","authors":"Quwei Rao, Zhiwei Shi, Jing Ji","doi":"10.1016/j.dsp.2025.105635","DOIUrl":"10.1016/j.dsp.2025.105635","url":null,"abstract":"<div><div>In surface defect segmentation, achieving high accuracy while maintaining lightweight model characteristics is crucial for industrial applications. However, designing lightweight models that achieve high accuracy without compromising computational efficiency remains a significant challenge. In this work, we propose X-ULite, an enhanced version of the ULite framework, designed to improve both accuracy and efficiency in defect segmentation. The core innovation lies in the XConv module, which decouples main and anti-diagonal convolutions into independent depthwise branches, thereby preserving orientation-specific features that are crucial for representing cracks and scratches. Additionally, a new BottleNeck module integrates XConv with Axial Depthwise Convolution (AxialDW) and standard depthwise convolution, jointly modeling axial, diagonal, and local features to achieve a comprehensive perception of defect regions. Evaluated on industrial datasets, XULite achieves 85.98 % mIoU on steel surfaces (NEU-Seg), 75.07 % mIoU on Magnetic Tile Defect datasets, and 91.44 % mIoU on Mobile phone screen surface defect (MSD) datasets with only 0.97M parameters. The model maintains low computational complexity and parameters while demonstrating robust segmentation accuracy across diverse industrial scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105635"},"PeriodicalIF":3.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FFSNet: Adaptive features fusion of foundation models and self-supervised models for remote sensing image segmentation","authors":"Dunyou Liang, Feng Peng, Bing Wu, Xiaojun Cui, Haolin Zhuang, Guoyu Zhang","doi":"10.1016/j.dsp.2025.105634","DOIUrl":"10.1016/j.dsp.2025.105634","url":null,"abstract":"<div><div>Remote sensing image segmentation is essential for urban planning, environmental monitoring, and disaster assessment but is challenged by scarce pixel-level annotations, domain shifts, and the difficulty of segmenting spectrally similar land cover classes. Existing methods struggle to address these issues comprehensively. Supervised approaches like UNetFormer require extensive labeled data and have limited generalization. Foundation models like SAM enable zero-shot segmentation but are constrained by high inference overhead, limiting their practical use in remote sensing. Self-supervised models like DINO capture domain-specific features but lack the global priors and generalization capabilities of large-scale foundation models, reducing their effectiveness in complex remote sensing scenarios. To overcome these limitations, FFSNet is proposed as a novel framework that integrates a lightweight MobileSAM encoder with a DINOv2 self-supervised encoder pretrained on remote sensing data. Its core innovation, the adaptive feature fusion module, balances general visual priors and domain-specific representations using attention-based dynamic weighting. Additionally, a modified category mask decoder extends binary output to multi-class segmentation using learnable prototype vectors. Experiments on three benchmark datasets validate the effectiveness of FFSNet. It achieves a mIoU of 55.4 % on LoveDA, surpassing D2lS, a mF1 of 88.3 % on ISPRS Potsdam, outperforming AerialFormer, and a mF1 of 91.6 % on Vaihingen, while using only 44.7 M parameters—a 50 % reduction compared to D2lS. FFSNet establishes a new paradigm for efficient domain adaptation in foundation models, offering superior segmentation accuracy with reduced computational costs, making it highly practical for large-scale remote sensing applications.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105634"},"PeriodicalIF":3.0,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Li , Rui Liu , Chenli Guo , Mingyue Ni , Chuankun Li
{"title":"Sparse reconstruction of overpressure field for aerial explosive shock wave based on weighted total variation combined group sparse regularization","authors":"Jian Li , Rui Liu , Chenli Guo , Mingyue Ni , Chuankun Li","doi":"10.1016/j.dsp.2025.105633","DOIUrl":"10.1016/j.dsp.2025.105633","url":null,"abstract":"<div><div>An ill-posed analysis matrix exhibits pathological sparsity owing to the limited number of shockwave test nodes. In this paper, we propose a weighted total variation combined group sparse regularization method to reconstruct an invertible wave overpressure field. In order to better preserve image edge information, a weighted total variation method is utilized to process the image gradients by setting learnable parameters associated with the structure of the data space. Subsequently, a group sparse representation method, which is based on low-rank constraints using block-matching, is employed to achieve similarity among the non-local sub-blocks of the shock wave data to preserve the subtle details of the image. Lastly, the propose model is optimized through the alternating direction of the multipliers with alternating iterations. We also conduct simulations and field experiments to demonstrate the proposed method, where the reconstruction error of the entire area is reduced to approximately 13.5 % compared with existing methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105633"},"PeriodicalIF":3.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ning Li , Junjie Hou , Wenjiao Zhang , Yanan Zhuang , Qianqian Xu , Haohan Yong
{"title":"A speech emotion recognition method based on DST-GDCC and text-to-speech data augmentation","authors":"Ning Li , Junjie Hou , Wenjiao Zhang , Yanan Zhuang , Qianqian Xu , Haohan Yong","doi":"10.1016/j.dsp.2025.105636","DOIUrl":"10.1016/j.dsp.2025.105636","url":null,"abstract":"<div><div>Speech Emotion Recognition (SER) is a critical component of human-machine interaction, yet it confronts two fundamental challenges: limited feature extraction capabilities and data scarcity. This paper proposes a unified framework that synergistically addresses both issues through the co-design of a novel SER model and a high-quality data augmentation strategy. At its core, the Deformable Speech Transformer (DST) and the Gated Dilation Causal Convolution (GDCC) are introduced, which are combined to form the DST-GDCC model for superior feature extraction. The DST component adaptively captures multi-granular acoustic features, while the GDCC module explicitly models the spatiotemporal causality of speech emotions. However, the full potential of such an advanced model is often constrained by scarce training data. To overcome this limitation, a Text-to-Speech (TTS) data augmentation method is incorporated, leveraging a pre-trained GPT-SoVITS model to synthesize high-fidelity, emotion-conditioned speech samples. Crucially, these two components form a virtuous cycle: the powerful discriminative ability of the DST-GDCC model is leveraged in a dual-stage screening mechanism to ensure the quality of the synthetic data, while the expanded, high-quality dataset, in turn, enables the model to realize its full potential. Experimental results demonstrate the framework's effectiveness. The DST-GDCC model itself achieves significant accuracy improvements over baselines (2.66% on IEMOCAP, 5.02% on MELD, 5.83% on CASIA). More importantly, the synergistic integration with TTS data augmentation yields further gains of 3.13% on IEMOCAP and 3.33% on CASIA, validating the framework's capability to systematically elevate SER performance.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105636"},"PeriodicalIF":3.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Zhai , Daqi Li , Yan Li , Mingyang Liang , Yalin Song , Zhen Wang
{"title":"EFENet: Edge and feature enhancement network for stroke lesion segmentation","authors":"Rui Zhai , Daqi Li , Yan Li , Mingyang Liang , Yalin Song , Zhen Wang","doi":"10.1016/j.dsp.2025.105629","DOIUrl":"10.1016/j.dsp.2025.105629","url":null,"abstract":"<div><div>Automatic lesion segmentation holds significant clinical value for the diagnosis and rehabilitation of brain stroke. However, it faces challenges such as blurred lesion edges and variations of lesion morphology and size. To tackle these problems, we propose a model for stroke lesion segmentation based on edge and feature enhancement named EFENet. First, we propose an edge-aware decoder (EAD), which first predicts the overall lesion region and then extracts the predicted edges using a morphology-based method. An edge feature enhancement component is incorporated in the EAD to strengthen the lesion edge features in the image, alleviating the impact of edge blurring on segmentation performance. Second, local-enhanced Swin Transformer (LE-Swin) blocks are introduced in the encoders. A convolution-based local feature extraction branch is added to the window-based multi-head self-attention (W-MSA), enhancing the model’s ability to capture both global and local features. Finally, a Channel Attention Fusion module (CAF) is employed at skip connections to fuse the encoder’s global features and the decoder’s edge-enhanced features using channel attention, reducing the feature gap. Extensive experiments are conducted on two public datasets, ATLAS and ISLES2022. EFENet achieves Dice coefficients of 0.5906 and 0.7598, respectively.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105629"},"PeriodicalIF":3.0,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MSANet: Multi-Stage attention network for anomalous sound detection in machine condition monitoring","authors":"Hao Zhou, Yi Zhou, Yin Liu, Hongqing Liu","doi":"10.1016/j.dsp.2025.105626","DOIUrl":"10.1016/j.dsp.2025.105626","url":null,"abstract":"<div><div>Anomalous Sound Detection (ASD) system identifies sound waves from sensors to detect the anomaly of industrial machines. However, recent methods have failed to sufficiently focus on partial details and long-term dependence information in acoustic features, resulting in poor performance on certain machine types. To address this challenge, we propose a novel ASD model based on the multi-stage attention network (MSANet). The spectral-temporal concatenated spectrogram of the audio samples is used as the MSANet input, and serially modeled by the network. The fusion spectrogram attention network (FSAN) enhances inter-spectrogram correlation via directional pooling and attention weighting. Convolutional Block Attention Module (CBAM) is used in the local attention network to focus on the channel and spatial information in acoustic vectors, hence improving capacity of ASD system for modeling local information. In global attention network, the gated recurrent unit (GRU) is applied to improve the feed-forward layer of transformer, enhancing the model to capture global correlation feature and contextual information. Extensive experiments are conducted out on the DCASE 2020 Challenge Task 2 dataset to evaluate the proposed model. Experimental results demonstrate that MSANet achieves an average AUC of 94.89 % and an average pAUC of 89.11 %, both of which surpass the performance of previously methods.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105626"},"PeriodicalIF":3.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DOA estimation method for sparse arrays based on deep convolutional autoencoder and deep convolutional neural network","authors":"Shuhan Guo , Qin Zhang , Xiaolong Fu , Guimei Zheng , Hao Zhou","doi":"10.1016/j.dsp.2025.105627","DOIUrl":"10.1016/j.dsp.2025.105627","url":null,"abstract":"<div><div>This paper proposes a Direction-of-Arrival (DOA) estimation method based on Deep Convolutional Autoencoder (DCAE). This method constructs a DCAE to map the covariance matrix of the received signals of a sparse array into a feature space and then reconstructs it into the covariance matrix of the received signals of a uniform linear array. Subsequently, the DOA estimation is performed in combination with the MUSIC algorithm, which effectively increases the degrees of freedom of the sparse array and better solves the DOA estimation problem under the underdetermined condition of the sparse array. To address the issues of low estimation accuracy and poor angular resolution in traditional algorithms for sparse arrays, a DOA estimation method based on Deep Convolutional Neural Network (DCNN) is proposed. This method extracts the mapping from the covariance matrix of the received signals of the physical elements of the sparse array to the angles of arrival, achieving higher accuracy and higher resolution DOA estimation.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105627"},"PeriodicalIF":3.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient and lightweight rotating target detection model for industrial scenarios","authors":"Mingyao Teng, Guoyang Wan, Shoujun Bai, Yunhao Zhu, Hanqi Li, Chengwen Wang","doi":"10.1016/j.dsp.2025.105631","DOIUrl":"10.1016/j.dsp.2025.105631","url":null,"abstract":"<div><div>Currently, target detection technology has largely matured in industrial scenarios, but most applications still rely on horizontal bounding boxes for target detection. For industrial parts with irregular shapes, varying sizes, and diverse categories, horizontal bounding box detection tends to introduce unnecessary background information, leading to false positives or missed detections. It also suffers from the issue of losing boundary direction. To address these challenges, this paper proposes a novel YOLO11 model for detecting rotating targets (DAGP-YOLO). First, a rotating detection head is introduced to minimize the interference of redundant background information. Dynamic convolution is applied to expand the receptive field of the network, and the ADown module replaces the original down sampling method to improve detail extraction. Second, an orientation-aware attention mechanism (GCA) is designed to better focus on the directional features of rotating targets. Lastly, to meet the demand of small storage space and high detection accuracy of edge devices in industry, this paper adopts the L1 filter pruning strategy to compress the improved model. We performed experimental validation on a self-constructed dataset, the publicly available MVTec Screws dataset, and the UCAS AOD dataset in the aerial photography domain. The results demonstrate the superiority and effectiveness of our approach. Additionally, we developed a visualization system based on the C# WinForms framework, which allows for real-time detection and display of workpiece images, further showcasing the practical applicability of the improved method in industrial settings.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105631"},"PeriodicalIF":3.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145219392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Van Son Nguyen , Phuong Nhung Do , Pham Xuan Khanh , Hai-Nam Le , Nguyen Thu Phuong , Pham Thanh Hiep
{"title":"On the BLER performance of UAV-assisted dual-hop V2V MIMO-NOMA systems in finite blocklength regime","authors":"Van Son Nguyen , Phuong Nhung Do , Pham Xuan Khanh , Hai-Nam Le , Nguyen Thu Phuong , Pham Thanh Hiep","doi":"10.1016/j.dsp.2025.105625","DOIUrl":"10.1016/j.dsp.2025.105625","url":null,"abstract":"<div><div>This paper investigates the block error rate (BLER) performance of a Unmanned Aerial Vehicle (UAV)-assisted Multiple-Input Multiple-Output (MIMO) Non-Orthogonal Multiple Access (NOMA) Vehicle-to-Vehicle (V2V) communication system operating in the finite blocklength (FBL) regime over double Rayleigh fading channels. The system model incorporates 3D UAV geometry, Maximum Ratio Transmission (MRT)/Maximum Ratio Combining (MRC) beamforming, and NOMA with successive interference cancellation (SIC). Due to the analytical complexity under MIMO and FBL conditions, a single-link approximation, linearized Q-function, and Meijer G-function are used to derive closed-form and asymptotic expressions for the end-to-end BLER. The analysis captures the effects of key system parameters, including blocklength, signal-to-noise ratio, power allocation, and SIC efficiency. Additionally, the impact of UAV altitude and vehicle velocity on reliability is quantified, revealing trade-offs between line-of-sight probability, path loss, and time correlation. Asymptotic analysis provides insights into diversity order, aiding the design of robust UAV-assisted ultra-reliable low-latency communication (URLLC) systems under mobility and short-packet constraints.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105625"},"PeriodicalIF":3.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145265497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A DOA estimation algorithm based on the low computational complexity log-sum sparse recovery","authors":"Jihui Lv , Shuai Liu , Ming Jin , Feng-Gang Yan","doi":"10.1016/j.dsp.2025.105623","DOIUrl":"10.1016/j.dsp.2025.105623","url":null,"abstract":"<div><div>The super-resolution iterative reweighted (SURE-IR) algorithm and the prior-knowledge aided super-resolution iterative reweighted (KA-SURE-IR) algorithm provide an important reference for the research of log-sum sparse recovery. However, even if the matrix inverse lemma is used, SURE-IR and KA-SURE-IR still have the problem of high computational complexity. Therefore, this paper designs a descent direction to achieve low complexity log-sum sparse recovery and direction of arrival (DOA) estimation. Firstly, the received signals are decomposed by singular value decomposition (SVD), and the corresponding log-sum sparse model is established. Then, the log-sum sparse model is relaxed to a convex model, the multiple signal classification (MUSIC) algorithm is used to provide prior information to promote sparse recovery, and the theoretical optimal value of the sparse signals in each iteration calculation is solved. Secondly, a descent direction is designed according to the current value and the theoretical optimal value of the sparse signals in each iteration calculation. Finally, the computational complexity of the proposed algorithm is reduced by selecting the regularization parameters as large as possible to reduce the influence of the residual value and by combining the matrix inverse lemma. The simulation results validated the effectiveness of the proposed algorithm in DOA estimation.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105623"},"PeriodicalIF":3.0,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}