Zhiwei Zhang , Hongyuan Gao , Qinglin Zhu , Yufeng Wang , Jiayi Wang
{"title":"Blind source separation method based on blind compression transformation under impulsive noise","authors":"Zhiwei Zhang , Hongyuan Gao , Qinglin Zhu , Yufeng Wang , Jiayi Wang","doi":"10.1016/j.dsp.2025.105095","DOIUrl":"10.1016/j.dsp.2025.105095","url":null,"abstract":"<div><div>When strong impulsive noise exists in observed signals, the existing blind source separation (BSS) methods are less accurate or even ineffective, and the parameter settings of existing noise suppression methods rely on prior knowledge to ensure good performance, thus cannot be applied to the BSS problem. To address the above problems, this paper proposes a BSS method that can still achieve effective signal separation under impulsive noise. A new compression transformation function that does not depend on any prior knowledge is designed to process the observed signals, named the blind compression transformation (BCT) function. The received observed signals are processed using the proposed BCT, and then the short-time Fourier transformation (STFT) is performed on the processed observed signals to complete the signal separation in the frequency domain. An adaptive energy correlation permutation algorithm based on frequency correction is designed to solve the permutation ambiguity in the frequency domain, and the inverse short-time Fourier transformation (ISTFT) is performed to achieve the source signals recovery. In general, the proposed method can suppress impulsive noise without any prior knowledge and solve permutation ambiguity without empirically setting threshold, which achieves effective signal separation under impulsive noise. The superior performance of our proposed method is evaluated through numerical simulations for the considered scenarios.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105095"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue You , Yihong Wang , Yu Cai , Mingzhu Zhu , Bingwei He
{"title":"3D localization using lensless event sensors for fast-moving objects","authors":"Yue You , Yihong Wang , Yu Cai , Mingzhu Zhu , Bingwei He","doi":"10.1016/j.dsp.2025.105077","DOIUrl":"10.1016/j.dsp.2025.105077","url":null,"abstract":"<div><div>A novel event sensor-based object localization method is proposed in this paper. It addresses the accuracy limitations of event sensors caused by their limited spatial resolution and binary grayscale levels. The method uses flickering beacons and replaces the event camera's lens with a mask printed with a marker field. This configuration distributes location-coded events across the entire sensor instead of confining them to a small region, as in traditional methods. Major algorithms, including pattern simulation and optimized matching, are designed to achieve 3D localization and pose estimation. Experiments show a <strong>17.3%</strong> accuracy improvement over state-of-the-art event-based methods in average translation error, consistent across varying distances and angles. This demonstrates its suitability for <strong>surgical navigation</strong>, <strong>virtual reality</strong>, and other precise, real-time localization tasks.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105077"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinyu Zhang , Miao Gao , Tiancheng Li , Jiemin Duan , Yingmin Yi , Junli Liang
{"title":"A novel particle filter with noisy input","authors":"Xinyu Zhang , Miao Gao , Tiancheng Li , Jiemin Duan , Yingmin Yi , Junli Liang","doi":"10.1016/j.dsp.2025.105086","DOIUrl":"10.1016/j.dsp.2025.105086","url":null,"abstract":"<div><div>In nonlinear systems, system inputs play a critical role in achieving control objectives, yet they are highly susceptible to noise during measurement and execution. Ignoring input noise can cause the standard particle filter (SPF) algorithm to produce biased estimates. To address this issue, this study begins by analyzing how input noise contributes to the deviation in the SPF at first. A novel particle filter (PF) then is proposed, designed to be robust against noisy inputs by incorporating information from both process noise and input noise. This approach constructs a new importance density. Drawing inspiration from Gibbs sampling, the method hierarchically and independently samples input and state variables from the new importance density, which accounts for both input and state randomness. The input random variable is eliminated through Monte Carlo independent resampling of the two variables, yielding the final state estimate. To validate the proposed method, three comparative experiments were conducted, evaluating the SPF, the combined particle filter (CPF), and the auxiliary particle filter (APF) algorithms. The results demonstrate that the new PF outperforms SPF in handling nonlinear, non-Gaussian systems with noisy inputs and effectively mitigates deviations caused by input noise.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105086"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zijian Sun , Haoran Liu , Haibin Li , Yaqian Li , Wenming Zhang
{"title":"AVERFormer: End-to-end audio-visual emotion recognition transformer framework with balanced modal contributions","authors":"Zijian Sun , Haoran Liu , Haibin Li , Yaqian Li , Wenming Zhang","doi":"10.1016/j.dsp.2025.105081","DOIUrl":"10.1016/j.dsp.2025.105081","url":null,"abstract":"<div><div>Audiovisual Emotion Recognition (AVER) plays a critical role in various domains, including mental health monitoring, educational interactions, and human-computer interaction. However, existing methods often encounter three main challenges: insufficient feature extraction for each modality, imbalanced modality contributions, and inadequate exploitation of multimodal complementarity. To address these issues, this paper proposes an end-to-end framework called AVERFormer. Specifically, AVERFormer consists of an audio encoder, a visual encoder, and an audiovisual fusion module, with several notable innovations. First, we design a dual-branch audio encoder that extracts multi-frequency emotional features from raw audio waveforms, spectrograms, and Mel-Frequency Cepstral Coefficients (MFCCs). This approach not only captures fine-grained local details but also efficiently processes long-duration audio to obtain global representations, thereby enabling effective interaction between global and local features. Second, unlike conventional methods that focus solely on facial expressions, our visual encoder takes raw frames as input to capture a broad range of bodily cues, thereby extending the scope of emotional signals encoded in the visual domain. Third, we employ a guided cross-modal attention mechanism in the feature-fusion stage to enhance the complementarity and synergy between audio and visual features. Finally, we develop a hybrid loss function—comprising audiovisual and unimodal (classification) losses as well as a combined divergence (metric) loss—to conduct end-to-end training. This design balances inter-modal similarity and disparity, thereby further optimizing the multimodal fusion process. Experimental results on the RAVDESS, CREMA-D, and CMU-MOSEI datasets demonstrate that AVERFormer achieves classification accuracies of 97.92%, 87.20%, and 79.40%, respectively—significantly outperforming current state-of-the-art approaches and showcasing its superior performance in audiovisual emotion recognition.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105081"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Hong, Lei Wang, Jingming Su, Yun Li, Shikang Fang, Wen Li, Mushi Li, Hantao Wang
{"title":"CEMP-YOLO: An infrared overheat detection model for photovoltaic panels in UAVs","authors":"Yan Hong, Lei Wang, Jingming Su, Yun Li, Shikang Fang, Wen Li, Mushi Li, Hantao Wang","doi":"10.1016/j.dsp.2025.105072","DOIUrl":"10.1016/j.dsp.2025.105072","url":null,"abstract":"<div><div>Aiming at the complex working conditions of actual PV power stations, traditional PV panel detection methods employed by operators still result in some faults and safety risks. Under the framework of the YOLOv10n model, a CEMP-YOLOv10n-based infrared image detection algorithm for photovoltaic power plants is proposed. The improvements in CEMP-YOLOv10n comprise four main components. The ABCG_Block structure was designed, and the C2f structure within the Backbone component was optimized to enhance feature extraction capabilities. The ERepGFPN structure is used in the Neck component to retain semantic information and fuse features between high and low layers. The detector head was optimized with PConv convolution to minimize redundant information. Finally, MECA attention was added before P3, P4, and P5 detection heads to enhance adaptive recognition and accuracy.Experimental validation using infrared UAV imagery of PV panels shows the model's computational cost decreased to 4.7 GFLOPs, 72.3 % of the original. Parameters and weights decreased by 25.99 % and 24.13 %, respectively, while accuracy and mean average precision (mAP) improved by 8.3% and 2 %, reaching 86.6 % and 87.3 %. Compared to 13 YOLO-series algorithms, including DETR, YOLOv8n, YOLOv9-tiny, and YOLOv11n, the CEMP-YOLOv10n model demonstrates superior accuracy, parameter efficiency, and memory consumption. The CEMP-YOLOv10n model significantly improves defect recognition accuracy, reduces missed detections, and balances lightweight design with detection speed. This lays the foundation for future UAV inspection edge device deployment and smart PV big data platform creation.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105072"},"PeriodicalIF":2.9,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weijia Yu , Jianhe Du , Yuanzhi Chen , Shufeng Li , Xingwang Li , Shahid Mumtaz
{"title":"NLoS target localization in IRS-assisted FDA-MIMO radar: A tensor decomposition perspective","authors":"Weijia Yu , Jianhe Du , Yuanzhi Chen , Shufeng Li , Xingwang Li , Shahid Mumtaz","doi":"10.1016/j.dsp.2025.105093","DOIUrl":"10.1016/j.dsp.2025.105093","url":null,"abstract":"<div><div>Intelligent reconfigurable surface (IRS) provides an innovative solution for frequency diverse array multiple-input multiple-output (FDA-MIMO) radar systems in the localization of non-line-of-sight (NLoS) traffic targets. In this paper, we consider an IRS-assisted FDA-MIMO radar system and propose a NLoS multi-target localization algorithm based on tensor decomposition. Specifically, the received signals are first constructed as a third-order tensor model. Then, a sequential minimum description length (MDL) method is employed to estimate the number of targets in advance. With tensor decomposition, the steering matrices containing angle and range information are obtained. In the estimated transmitting steering matrix, the directions-of-departure (DODs) and ranges are successfully decoupled after solving the phase ambiguity. In the estimated receiving steering matrix, a two-dimensional grid search method is applied to obtain the horizontal directions-of-arrival (DOAs) and vertical DOAs. Finally, the localization of NLoS targets is determined by utilizing the geometric relationships of these estimated parameters. Besides, the Cramér-Rao bound (CRB) for the estimations of angle and range is derived as a performance benchmark. Simulation results demonstrate the effectiveness of the proposed algorithm in locating NLoS targets.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105093"},"PeriodicalIF":2.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengnan Yan , Yingshuai Zhao , Baoshun Shi , Yueming Su
{"title":"Unsupervised learning-based deep sparsifying transform network for joint CT metal artifact reduction and super-resolution reconstruction","authors":"Shengnan Yan , Yingshuai Zhao , Baoshun Shi , Yueming Su","doi":"10.1016/j.dsp.2025.105092","DOIUrl":"10.1016/j.dsp.2025.105092","url":null,"abstract":"<div><div>Due to the presence of metallic implants, metal artifacts degrade the quality of computed tomography (CT) images. Existing deep learning-based metal artifact reduction (DL-based MAR) methods rely on paired datasets, i.e., the ground truth CT image and its metal artifact corrupted version, for training. However, it is difficult to obtain paired datasets in clinical scenarios. Unsupervised learning-based MAR algorithms can use unpaired datasets to address the difficulty of obtaining paired data, but still face the following limitations: (<em>i</em>) most MAR network architectures lack interpretability and exhibit redundant learnable parameters, due to their empirical design nature; (<em>ii</em>) the representation ability of existing encoder-decoder-based network architectures is limited, and they often ignore the image resolution. To overcome these limitations, we introduce an unsupervised learning-based deep sparsifying transform network, dubbed UnDeepST, which is designed for the reconstruction of CT images with both metal artifact reduction (MAR) and super-resolution (SR) capabilities. UnDeepST is model-interpretable and has a smaller number of learnable parameters due to less recycling use of encoder and decoders, compared to previous unsupervised learning-based MAR methods. Furthermore, we design a task fusion module to assist MAR with the help of SR to reconstruct high-quality and high-resolution CT images. To the best of our knowledge, we are the first to merge the MAR and SR tasks to achieve mutual learning of information across different tasks. By designing various loss functions, UnDeepST can be trained on unpaired datasets in an end-to-end training manner. Experimental results demonstrate that UnDeepST can achieve competitive recovery quality and resolution compared to benchmark algorithms.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105092"},"PeriodicalIF":2.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel joint learning framework combining fuzzy C-multiple-means clustering and spectral clustering for superpixel-based image segmentation","authors":"Chengmao Wu, Pengfei Gai","doi":"10.1016/j.dsp.2025.105083","DOIUrl":"10.1016/j.dsp.2025.105083","url":null,"abstract":"<div><div>In recent years, image segmentation algorithms based on superpixels have been continuously developed. However, the superpixel algorithm consists of two independent stages: superpixel generation and superpixel segmentation. When the generation of superpixels is influenced by noise or complex backgrounds, the quality of the generated superpixel image can significantly decline<strong>,</strong> adversely affecting the subsequent segmentation results. Therefore, this paper proposes a robust multiple-means joint clustering algorithm based on superpixels, which integrates superpixel generation and superpixel image segmentation within a unified learning framework. This approach achieves multiple-means joint clustering by alternately optimizing and updating superpixel and sub-cluster centers. Compared with traditional superpixel segmentation algorithms, this method does not generate superpixels separately and demonstrates improved segmentation performance. Additionally, the algorithm incorporates spectral clustering to transform the superpixel image segmentation problem into a constrained Laplacian matrix rank optimization problem, ultimately achieving clustering based on bipartite graph connectivity, which further enhance the algorithm's robustness. Numerous experimental results indicate that the proposed algorithm yields superior segmentation outcomes compared with existing other superpixel segmentation algorithms and aligns more closely with real-world segmentation details.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105083"},"PeriodicalIF":2.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143534060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese Character Recognition based on Swin Transformer-Encoder","authors":"Ziying Li , Haifeng Zhao , Hiromitsu Nishizaki , Chee Siang Leow , Xingfa Shen","doi":"10.1016/j.dsp.2025.105080","DOIUrl":"10.1016/j.dsp.2025.105080","url":null,"abstract":"<div><div>Optical Character Recognition (OCR) technology, which converts printed or handwritten text into machine-readable text, holds significant application and research value in document digitization, information automation, and multilingual support. However, existing methods predominantly focus on English text recognition and often struggle with addressing the complexities of Chinese characters. This study proposes a Chinese text recognition model based on the Swin Transformer encoder, demonstrating its remarkable adaptability to Chinese character recognition. In the image preprocessing stage, we introduced an overlapping segmentation technique that enables the encoder to effectively capture the complex structural relationships between individual strokes in lengthy Chinese texts. Additionally, by incorporating a mapping layer between the encoder and decoder, we enhanced the Swin Transformer's adaptability to small image scenarios, thereby improving its feasibility for Chinese text recognition tasks. Experimental results indicate that this model outperforms classical models such as CRNN and ASTER on handwritten and web-based datasets, validating its robustness and reliability.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105080"},"PeriodicalIF":2.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-supervised disentangled representation learning with distribution alignment for multi-view clustering","authors":"Zhenqiu Shu, Teng Sun, Zhengtao Yu","doi":"10.1016/j.dsp.2025.105078","DOIUrl":"10.1016/j.dsp.2025.105078","url":null,"abstract":"<div><div>Recently, multi-view clustering has attracted much attention due to its strong capability to fully explore complementary information between multiple views. In general, there may be differences in feature distribution between views from different data sources. However, most existing methods usually directly fuse different views, ignoring the difference in contribution and importance of different views. Thus, it leads to mutual interference between common representation and view-specific information. To address these issues, in this paper, we propose a novel method, called self-supervised disentangled representation learning with distribution alignment (S2DRL-DA), for multi-view clustering. Firstly, the proposed method uses adversarial learning and attention mechanisms to align potential feature distributions and focus on the most critical view. Then the disentangled representation learning is used to separate common and specific representations learned from each view to reduce redundancy in multi-view data. Finally, we adopt KL divergence to assess the quality of the clustering result of each view and guide the model optimization. Extensive experiments on different datasets demonstrate that our S2DRL-DA approach produces competitive performance in multi-view clustering applications. The source code for this work can be found at <span><span>https://github.com/szq0816/S2DRL-DA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"161 ","pages":"Article 105078"},"PeriodicalIF":2.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}