{"title":"Integrated DNN-Based Parameter Estimation for Multichannel Speech Enhancement","authors":"Sein Cheong;Minseung Kim;Jong Won Shin","doi":"10.1109/LSP.2025.3599455","DOIUrl":"https://doi.org/10.1109/LSP.2025.3599455","url":null,"abstract":"One of the popular configurations for the statistical model-based multichannel speech enhancement (SE) is to apply a spatial filter such as the minimum-variance distortionless response beamformer followed by a single channel post-filter, and some of the deep neural network (DNN)-based approaches mimic it. While a number of DNN-based SE focused on direct estimation of clean speech features or the masks to estimate clean speech, some of the efforts were devoted to estimate the statistical parameters. DNN-based parameter estimation with two DNNs for a beamforming stage and a post-filtering stage has demonstrated impressive performance, but the parameter estimation for a beamformer and that for a post-filter operate separately, which may not be optimal in that the post-filter cannot utilize spatial information from multi-microphone signals. In this letter, we propose integrated DNN-based parameter estimation for multichannel SE based on both the beamformer output and multi-microphone signals. The speech presence probability and the power spectral densities for speech and noise estimated in the beamforming stage are utilized in the post-filtering stage for better parameter estimation. We also adopt the dual-path conformer structure with an encoder and decoders to enhance the performance. Experimental results show that the proposed method marked the best wideband perceptual evaluation of speech quality (PESQ) scores on the CHiME-4 dataset among all methods with comparable computational complexity.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3320-3324"},"PeriodicalIF":3.9,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Weakly Monotone Operators for Convergent Plug-and-Play PET Reconstruction","authors":"Marion Savanier;Claude Comtat;Florent Sureau","doi":"10.1109/LSP.2025.3598700","DOIUrl":"https://doi.org/10.1109/LSP.2025.3598700","url":null,"abstract":"This letter extends the capabilities of Plug-and-Play ADMM, a popular algorithm for solving inverse problems while leveraging deep learning priors. Convergence results on PnP ADMM often rely on the Douglas-Rachford (DR) splitting method and require a firmly nonexpansive constraint on the plugged network. Common convolutional architectures do not inherently verify this constraint, and many works are now trying to circumvent it. Building on recent advancements in the DR method for handling weakly monotone operators, we propose a modification of PnP ADMM for low-count Positron Emission Tomography reconstruction, allowing for networks trained on reconstruction-specific tasks with a more general averageness constraint. Our numerical experiments on simulated brain data demonstrate that this flexibility simplifies training and improves reconstruction quality.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3405-3409"},"PeriodicalIF":3.9,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TO-LF: A Texture and Occlusion-Oriented Benchmark Dataset for Light Field Disparity Estimation","authors":"Shubo Zhou;Yunlong Wang;Yingqian Wang;Fei Liu;Xue-qin Jiang","doi":"10.1109/LSP.2025.3598728","DOIUrl":"https://doi.org/10.1109/LSP.2025.3598728","url":null,"abstract":"Accurate disparity estimation in light field (LF) imaging remains challenging due to the narrow baseline between adjacent sub-aperture images (SAIs) and the occlusion effect. Existing learning-based methods suffer from degraded performance in complex scenarios owing to the scarcity of high-quality and diverse training data. To address this limitation, we propose a Texture and Occlusion-oriented Light Field dataset (TO-LF) containing 78 carefully curated images. Unlike the widely used HCI 4D LF benchmark, TO-LF not only provides more training samples but also introduces a more challenging test set with complex occlusions and significant textureless regions. Furthermore, we present a viewpoint-selective sub-pixel cost volume construction method (VS-Sub), which extends disparity labels to the subpixel level for denser cost volumes, and employs dynamic dilated convolutions to differentiate between occluded and non-occluded viewpoints. Comprehensive experiments demonstrate that our framework achieves state-of-the-art (SOTA) performance in disparity estimation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3315-3319"},"PeriodicalIF":3.9,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingfeng Chen;Panhe Hu;Zhiliang Pan;Qi Liu;Shuanghui Zhang;Zhen Liu
{"title":"Large Language Models Can Achieve Explainable and Training-Free One-Shot HRRP ATR","authors":"Lingfeng Chen;Panhe Hu;Zhiliang Pan;Qi Liu;Shuanghui Zhang;Zhen Liu","doi":"10.1109/LSP.2025.3598220","DOIUrl":"https://doi.org/10.1109/LSP.2025.3598220","url":null,"abstract":"This letter introduces a pioneering, training-free and explainable framework for High-Resolution Range Profile (HRRP) automatic target recognition (ATR) utilizing large-scale pre-trained Large Language Models (LLMs). Diverging from conventional methods requiring extensive task-specific training or fine-tuning, our approach converts one-dimensional HRRP signals into textual scattering center representations. Prompts are designed to align LLMs’ semantic space for ATR via few-shot in-context learning, effectively leveraging its vast pre-existing knowledge without any parameter update.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3395-3399"},"PeriodicalIF":3.9,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vibha Bhandari;Narendra D. Londhe;Ghanahshyam B. Kshirsagar
{"title":"A Technical Odyssey of Self-Supervised Representation Learning for Devanagari-Script-Based P300 Speller","authors":"Vibha Bhandari;Narendra D. Londhe;Ghanahshyam B. Kshirsagar","doi":"10.1109/LSP.2025.3597875","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597875","url":null,"abstract":"Traditional supervised learning (SL) methods for P300 event-related potential (ERP) detection in P300 spellers require extensive labelled data and often struggle to generalize well across subjects and trials, especially with limited data. Previous efforts using transfer learning and knowledge distillation improved performance but still face high computational complexity and lack transparency. These issues highlight the need to explore new approaches to enhance transferability and reduce uncertainty. To address this, we investigated the effectiveness of representational learning through a self-supervised approach. Our self-supervised learning (SSL) framework, featuring a compact convolutional neural network (CNN) backbone and label-agnostic characteristics, improves the robustness of learned features to variations in ERPs encountered in P300 speller. Experiments on self-recorded data and ablation studies show that the learned representations are robust and effective. Achieving an accuracy of 84%, the downstream classifier trained on the SSL framework performed competitively with traditional supervised methods. Additionally, comparison between features learned with SL and SSL, using t-SNE visualization and correlation coefficient (r = -0.51) analysis, demonstrates that SSL features offer better discrimination between P300 and non-P300.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3420-3424"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quaternion Wavelet-Driven Multi-Scale Feature Interaction Network for Color Image Denoising","authors":"Shan Gai;Yihao Wu;Shiguang Lu","doi":"10.1109/LSP.2025.3597878","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597878","url":null,"abstract":"Real-valued wavelets have achieved great success in image denoising due to their sparse representation capability under multi-scale analysis. However, existing real-valued wavelets suffer from limited directional selectivity and translation sensitivity, which can lead to color distortion and loss of phase information. The quaternion wavelet transform (QWT) offers a new solution by extending each pair of complex filters in the dual-tree complex wavelet transform to quaternion-valued filter banks, generating quaternion high frequency subbands in three principal directions while retaining a low frequency approximation, thus achieving cross channel translation invariance and phase consistency. Based on this, we propose a QWT-driven multi-scale feature interaction network (QMFINet). QMFINet leverages QWT to extract cross channel structured phase features at the same spatial locations, precisely linking color and texture details; it further employs a three-path feature extraction module (TPFEM) to capture multi-scale representations. To effectively fuse features at different resolutions, we design a quaternion ordered channel attention subnet (QOCAS). Experimental results demonstrate that QMFINet outperforms several state-of-the-art color image denoising methods across a range of noise levels, and achieves the best performance at <inline-formula><tex-math>$sigma =75$</tex-math></inline-formula>, with an average PSNR improvement of approximately 0.3-0.4dB over the previous state-of-the-art method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3425-3429"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chroma Subsampling for Enhanced Geometry-Based Point Cloud Compression","authors":"Zehan Wang;Yuxuan Wei;Jongseok Lee;Hyejung Hur;Hui Yuan","doi":"10.1109/LSP.2025.3597874","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597874","url":null,"abstract":"Due to the huge data volume of three dimensional point clouds, efficient point cloud compression (PCC) is very important and challenging under limited storage and bandwidth conditions. The Moving Picture Experts Group (MPEG) is actively developing the geometry-based point cloud compression (G-PCC) standard and plan to release the second edition of G-PCC, namely Enhanced G-PCC. In image and video compression, chroma components are typically encoded at a lower resolution than luma, with minimal perceptual quality loss. However, chroma subsampling has not yet been explored in PCC. We investigate the characteristics of points at different level of details, and propose a chroma subsampling that can be embedded with the codec of Enhanced G-PCC. Experimental results show that the proposed method outperforms the state-of-the-art Enhanced G-PCC reference software version29.0 in terms of coding efficiency and time complexity. Due to the excellent performance, the proposed method has been adopted by the MPEG and will be integrated into the upcoming version of the reference software of Enhanced G-PCC.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3255-3259"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and Provable Low-Rank High-Order Tensor Completion via Scaled Gradient Descent","authors":"Tong Wu;Fang Zhang","doi":"10.1109/LSP.2025.3597829","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597829","url":null,"abstract":"This work studies the low-rank high-order tensor completion (HOTC) problem, which aims to exactly recover a low-rank order-<inline-formula><tex-math>$d$</tex-math></inline-formula> (<inline-formula><tex-math>$d geq 4$</tex-math></inline-formula>) tensor from partially observed entries. Leveraging the low-rank structure under the tensor Singular Value Decomposition (t-SVD), instead of relying on the computationally expensive tensor nuclear norm (TNN), we propose an efficient algorithm, termed the HOTC-SGD, that directly estimates the high-order tensor factors—starting from a spectral initialization—via scaled gradient descent (SGD). Theoretically, we rigorously establish the recovery guarantee of HOTC-SGD under mild assumptions, demonstrating that it achieves linear convergence to the true low-rank tensor at a constant rate that is independent of the condition number. Numerical experiments on both synthetic and real-world data verify our results and demonstrate the superiority of our method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3250-3254"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Location-Aided Maximal Ratio Combining for an Acoustic Vector Sensor in Multipath Channels","authors":"Xinghao Qu;Zhigang Shang;Gang Qiao;Yiwen Zhou","doi":"10.1109/LSP.2025.3597557","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597557","url":null,"abstract":"The multi-channel outputs of an acoustic vector sensor (AVS) provide diversity gain for communications, and developing an effective combining scheme becomes a critical issue. However, in underwater multipath channels, a single AVS struggles to estimate the spatial signatures of multipath signals due to its limited sensing capability, which compromises the design of optimal combining weights. To overcome this issue, we propose a location-aided maximal ratio combining (MRC) technique. Armed with a predictable end-to-end propagation model, we first develop a maximum-likelihood sensing framework with the help of the pilot subcarriers embedded in the OFDM signal. The required channel state information is inferred from the estimated propagation geometry. Then, the combining weight vector is determined according to the MRC principle. Simulations demonstrate that this integrated scheme enhances communication performance through comprehensive environmental sensing.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3300-3304"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deformable Locality-Coordination Graph Motifs for 3D Skeleton Based Person Re-Identification","authors":"Haocong Rao;Chunyan Miao","doi":"10.1109/LSP.2025.3597562","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597562","url":null,"abstract":"Existing 3D skeleton based person re-identification (re-ID) approaches typically model skeletons as graphs to capture body relations and motion. However, they often rely on <italic>fixed</i> joint’s connections such as adjacency for relation modeling, while lacking a flexible and specific focus on key body joints or parts of <italic>different levels</i> to capture various local relations (<italic>“locality”</i>) and limb relations (<italic>“coordination”</i>). In this letter, we propose Deformable Locality-Coordination graph Motifs (DL-CM) that can guide the body relation learning to particularly capture multi-order <italic>locality</i> and <italic>coordination</i> of key gait-specific body parts to enhance person re-ID performance. Specifically, we first devise Deformable Locality Motifs (DLM), which are applicable to deformed skeleton graphs at different levels, to simultaneously focus on different-order neighbors’ relations for body structure and pattern learning. Then, we propose Deformable Coordination Motifs (DCM) to concurrently capture local and global coordination of different-level limbs in deformed graphs, so as to facilitate learning discriminative gait patterns for person re-ID. Extensive experiments on four public benchmarks demonstrate the effectiveness of DL-CM on state-of-the-art models and different-level graph representations to improve person re-ID performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3655-3659"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}