{"title":"LHCAF-Based Hyperparameter-Free Sparse Channel Estimator for Hybrid MIMO Millimeter Wave Communication Under Impulsive Noise","authors":"Deepjyoti Boro;Sandesh Jain;Rangeet Mitra;Pragya Swami","doi":"10.1109/LSP.2025.3604354","DOIUrl":"https://doi.org/10.1109/LSP.2025.3604354","url":null,"abstract":"Multiple-input multiple-output (MIMO) millimeter wave (mmWave) is a promising technology for beyond 5 G communication systems, which provides high data rates, ultra-low latency, massive connectivity, and high spectral efficiency. However, a typical mmWave communication link faces major challenges due to signal loss over distance and imperfections in the hardware, which cause distortions that don’t follow the usual Gaussian pattern. Furthermore, MIMO mmWave channel exhibits sparseness due to blockage and scattering. The conventional orthogonal matching pursuit (OMP), zero-attracting least mean square (ZA-LMS), and their variants deliver suboptimal performance for MIMO-mmWave systems impaired by impulsive noise. In this letter, we propose robust zero-attracting logarithmic hyperbolic cosine adaptive filtering (ZA-LHCAF) for MIMO-mmWave channel estimation. Furthermore, we derive a sampling rule for spread parameter of ZA-LHCAF, which renders the proposed estimator hyperparameter free. The simulation results show that the proposed algorithm outperforms existing state-of-the-art estimators. Lastly, convergence analysis is presented for the proposed ZA-LHCAF algorithm, which validated by realistic computer simulations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3515-3519"},"PeriodicalIF":3.9,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interleaved Dynamic Fusion Network for Occluded Person Re-Identification","authors":"Yunzuo Zhang;Weiqi Lian;Yuehui Yang;Shuangshuang Wang;Jiawen Zhen","doi":"10.1109/LSP.2025.3603840","DOIUrl":"https://doi.org/10.1109/LSP.2025.3603840","url":null,"abstract":"Most existing occluded person re-identification methods use a part-based approach to extract pedestrian features. The extracted part features are isolated from each other, resulting in insufficient information exchange between part features. To address this issue, we propose an interleaved dynamic fusion network (IDFNet) for occluded person re-identification. Initially, an interleaved feature pyramid module (IFPM) was designed, which recursively transmits rich semantic information from high-level feature maps to the bottom layer through interleaved connections, achieving the extraction of multi-scale information. Secondly, a multi-scale feature dynamic fusion module (MDFM) to effectively integrate multi-scale information in IFPM and achieve cross-scale feature fusion. It allows the network to dynamically select the most suitable features for fusion based on pedestrian characteristics and size. Finally, the designed feature interaction module (FIM) uses different semantic part features as graph nodes, allowing information transfer between nodes, suppressing the transfer of meaningless feature information such as occlusion, promoting the transfer of semantic feature information, and effectively alleviating occlusion problems. Extensive experimental results on both occluded and holistic datasets demonstrate the efficacy of our approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3480-3484"},"PeriodicalIF":3.9,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baojian Ren;Tao Cao;Zhengyang Zhang;Shuchen Bai;Na Liu
{"title":"Hierarchical Signal Calibration and Refinement for Multimodal Sentiment Analysis","authors":"Baojian Ren;Tao Cao;Zhengyang Zhang;Shuchen Bai;Na Liu","doi":"10.1109/LSP.2025.3603884","DOIUrl":"https://doi.org/10.1109/LSP.2025.3603884","url":null,"abstract":"To address the issues of noise amplification and feature incompatibility arising from modal heterogeneity in multimodal sentiment analysis, this paper proposes a hierarchical optimization framework. In the first stage, we introduce the Semantic-Guided Calibration Network (SGC-Net), which, through a Dynamic Balancing Regulator (DBR), leverages textual semantics to intelligently weight and calibrate the cross-modal interactions of audio and video, thereby suppressing noise while preserving key dynamics. In the second stage, the Synergistic Refinement Fusion Module (SRF-Module) performs a deep refinement of the fused multi-source features. This module employs a Saliency-Gated Complementor (SGC) to rigorously filter and exchange effective information across streams, ultimately achieving feature de-redundancy and strong complementarity. Extensive experiments on the CMU-MOSI and CMU-MOSEI datasets validate the effectiveness of our method, with the model achieving state-of-the-art performance on key metrics such as binary accuracy (Acc-2: 86.73% on MOSI, 86.52% on MOSEI) and seven-class accuracy (Acc-7: 48.35% on MOSI, 53.81% on MOSEI).","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3450-3454"},"PeriodicalIF":3.9,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nengxiang Zhang;Baojiang Zhong;Minghao Piao;Kai-Kuang Ma
{"title":"BAN: A Boundary-Aware Network for Accurate Colorectal Polyp Segmentation","authors":"Nengxiang Zhang;Baojiang Zhong;Minghao Piao;Kai-Kuang Ma","doi":"10.1109/LSP.2025.3603828","DOIUrl":"https://doi.org/10.1109/LSP.2025.3603828","url":null,"abstract":"Colonoscopy images exhibit multi-frequency features, with polyp boundaries residing in a mid-frequency range, which are critical for accurate polyp segmentation. However, current deep learning models tend to prioritize low-frequency features, leading to reduced segmentation performance. To address this challenge, we propose a novel <italic>boundary-aware network</i> (BAN) that integrates trainable Gabor filters into the polyp segmentation process through a dedicated module called <italic>Gabor-driven feature extraction</i> (GFE). By developing and using a <italic>trajectory-directed frequency learning</i> approach, Gabor filters are trained along a <italic>damping sinusoidal</i> path, dynamically optimizing their frequency parameters within a proper mid-frequency range. This enhances boundary feature representation and significantly improves polyp segmentation accuracy. Extensive experiments demonstrate that our BAN outperforms existing state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3460-3464"},"PeriodicalIF":3.9,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformer Based Unsupervised Cross-Modal Hashing for Normal and Remote Sensing Retrieval","authors":"Weikang Gao;Zifan Liu;Yuan Cao;Zuojin Huang;Yaru Gao","doi":"10.1109/LSP.2025.3602637","DOIUrl":"https://doi.org/10.1109/LSP.2025.3602637","url":null,"abstract":"With the rapid expansion of online information, cross-modal retrieval has emerged as a crucial and dynamic research focus. Deep hashing has gained significant traction in this field due to its efficiency in storage and retrieval speed, making it particularly valuable for remote sensing multi-modal retrieval. However, existing deep cross-modal hashing techniques often rely on parallel network structures for processing different modalities, overlooking a unified representation that captures cross-modal visual information. To address this limitation, we introduce a novel unsupervised cross-modal hashing framework that incorporates two modality-specific encoders and a fusion module. This fusion module facilitates modality interaction, enabling the extraction of meaningful semantic relationships across different data types. To ensure comprehensive similarity preservation, we design an integrated objective function that incorporates inter-modal and intra-modal constraints, joint consistency, and binary alignment losses. Furthermore, instead of conventional convolutional networks, we adopt the Swin Transformer as the backbone to enhance the discriminative power of image features. Our approach achieves an average 2.3% improvement in mAP on remote sensing cross-modal retrieval tasks compared to existing methods. The implementation is available at <uri>https://github.com/caoyuan57/TUCH</uri>.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3540-3544"},"PeriodicalIF":3.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eliminating Non-Overlapping Semantic Misalignment for Cross-Modal Medical Retrieval","authors":"Zeqiang Wei;Zeyi Hou;Xiuzhuang Zhou","doi":"10.1109/LSP.2025.3602392","DOIUrl":"https://doi.org/10.1109/LSP.2025.3602392","url":null,"abstract":"In recent years, increasing research has shown that fine-grained local alignment is crucial for the cross-modal medical image-report retrieval task. However, existing local alignment learning methods suffer from the misalignment of semantically non-overlapping features between different modalities, which in turn negatively affects the retrieval performance. To address this challenge, we propose a Global-Feature Guided Cross-modal Local Alignment (GFG-CMLA) method. Unlike prior methods that rely on explicit local attention or learned weighting mechanisms, our approach leverages global semantic features extracted from the cross-modal common semantic space to implicitly guide local alignment, adaptively focusing on semantically overlapping content while filtering out irrelevant local regions, thus mitigating misalignment interference without additional annotations or architectural complexity. We validated the effectiveness of the proposed method through ablation experiments on the MIMIC-CXR and CheXpert Plus dataset. Furthermore, comparisons with state-of-the-art local alignment methods indicate that our approach achieves superior cross-modal retrieval performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3510-3514"},"PeriodicalIF":3.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-View Attention Hypergraph Neural Network for Radar Emitter Signal Sorting","authors":"Hongzhuo Chen;Liangang Qi;Qiang Guo;Mykola Kaliuzhnyi","doi":"10.1109/LSP.2025.3602393","DOIUrl":"https://doi.org/10.1109/LSP.2025.3602393","url":null,"abstract":"In complex electromagnetic environments, the deinterleaving of dense, interwoven radar pulse signals poses a formidable challenge. To address the propensity of conventional graph models for confusion and misclassification in such scenarios, this letter proposes a novel method for radar emitter signal deinterleaving: the Multi-view Attention Hypergraph Neural Network (MVA-HGNN). This model maps radar pulses to nodes in a hypergraph, leveraging hyperedges to capture higher-order correlations among pulses. This overcomes the limitation of traditional graphs, which can only describe pairwise relationships. To fully exploit the heterogeneous information within Pulse Descriptor Words (PDWs), we construct two distinct hypergraph views: “spatial” and “intrinsic.” In the MVA - HGNN model, parallel hypergraph network branches learn node representations from different views. An advanced attention - based fusion mechanism is introduced to dynamically integrate these feature representations. Simulation results demonstrate that our method achieves superior performance, particularly in small data scenarios, showing great potential for engineering applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3754-3758"},"PeriodicalIF":3.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Wei;Bo Jiang;Yuye Ling;Peiyao Jin;Xinbing Wang
{"title":"Unsupervised Domain Adaptation With Anatomical-Aware Self-Training for Optic Disc Segmentation in Abnormal Fundus Images","authors":"Xiao Wei;Bo Jiang;Yuye Ling;Peiyao Jin;Xinbing Wang","doi":"10.1109/LSP.2025.3602653","DOIUrl":"https://doi.org/10.1109/LSP.2025.3602653","url":null,"abstract":"Optic disc (OD) segmentation in abnormal fundus images is crucial for glaucoma screening, and different screening populations may alter the types and proportions of abnormalities. Since annotating all abnormal types or re-annotating for each screening scenario is costly, an alternative is to utilize existing annotated data. However, these datasets only contain limited abnormal types, leading to a domain shift issue. Unsupervised domain adaptation alleviates this issue through adversarial learning or self-training. Yet, adversarial learning methods tend to overemphasize brightness as a discriminative feature, which fails under pathological changes, while self-training approaches remain vulnerable to noisy pseudo-labels. Existing denoising methods assume noise lies near decision boundaries, but abnormalities can produce noise far from them. In this letter, we propose an unsupervised domain adaptation method integrating anatomical-aware self-training with adversarial learning for OD segmentation. By exploiting the OD’s convex shape and boundary consistency, we develop two pseudo-labeling strategies to suppress noise. Experiments on four fundus image datasets demonstrate the effectiveness of our method in diverse screening scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3475-3479"},"PeriodicalIF":3.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhao Jiang;Weixiang Zhong;Pattathal V. Arun;Pei Xiang;Dong Zhao
{"title":"SRTE-Net: Spectral-Spatial Similarity Reduction and Reorganized Texture Encoding for Hyperspectral Video Tracking","authors":"Wenhao Jiang;Weixiang Zhong;Pattathal V. Arun;Pei Xiang;Dong Zhao","doi":"10.1109/LSP.2025.3602380","DOIUrl":"https://doi.org/10.1109/LSP.2025.3602380","url":null,"abstract":"Hyperspectral video tracking poses unique challenges due to the high dimensionality of spectral data and the limited capacity to capture discriminative texture information. To address this, we propose a novel tracking framework that integrates spectral-spatial similarity reduction with reorganized texture encoding for robust hyperspectral target tracking. Specifically, we introduce a dimensionality compression strategy that converts the multi-band hyperspectral input into a representative grayscale image, preserving key spectral-spatial cues. To enhance discriminative texture modeling, a 3D Gabor filter is applied to the search region, and the extracted responses are adaptively fused based on their local variance. The resulting texture representations are selectively masked to suppress background noise and are then passed into a correlation filter module for precise target localization. Furthermore, we design a template update mechanism that mitigates model drift and cumulative errors during tracking. Extensive experiments on public hyperspectral video benchmarks demonstrate that our method achieves competitive performance against state-of-the-art hyperspectral trackers, especially in scenarios with background clutter.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3390-3394"},"PeriodicalIF":3.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can Layer-Wise SSL Features Improve Zero-Shot ASR Performance for Children’s Speech?","authors":"Abhijit Sinha;Hemant Kumar Kathania;Sudarsana Reddy Kadiri;Shrikanth Narayanan","doi":"10.1109/LSP.2025.3602636","DOIUrl":"https://doi.org/10.1109/LSP.2025.3602636","url":null,"abstract":"Automatic Speech Recognition (ASR) systems often struggle to accurately process children’s speech dueto its distinct and highly variable acoustic and linguistic characteristics. While recent advancements in self-supervised learning (SSL) models have greatly enhanced the transcription of adult speech, accurately transcribing children’s speech remains a significant challenge. This study investigates the effectiveness of layer-wise features extracted from state-of-the-art SSL pre-trained models - specifically, Wav2Vec2, HuBERT, Data2Vec, and WavLM in improving the performance of ASR for children’s speech in zero-shot scenarios. A detailed analysis of features extracted from these models was conducted, integrating them into a simplified DNN-based ASR system using the Kaldi toolkit. The analysis identified the most effective layers for enhancing ASR performance on children’s speech in a zero-shot scenario, where WSJCAM0 adult speech was used for training and PFSTAR children speech for testing. Experimental results indicated that Layer 22 of the Wav2Vec2 model achieved the lowest Word Error Rate (WER) of 5.15%, representing a 51.64% relative improvement over the direct zero-shot decoding using Wav2Vec2 (WER of 10.65%). Additionally, age group-wise analysis demonstrated consistent performance improvements with increasing age, along with significant gains observed even in younger age groups using the SSL features. Further experiments on the CMU Kids dataset confirmed similar trends, highlighting the generalizability of the proposed approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3759-3763"},"PeriodicalIF":3.9,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}