Wei Wang;Shefeng Yan;Linlin Mao;Zeping Sui;Jirui Yang
{"title":"Ambiguity-Free Broadband DOA Estimation Relying on Parameterized Time-Frequency Transform","authors":"Wei Wang;Shefeng Yan;Linlin Mao;Zeping Sui;Jirui Yang","doi":"10.1109/LSP.2025.3550002","DOIUrl":"https://doi.org/10.1109/LSP.2025.3550002","url":null,"abstract":"An ambiguity-free direction-of-arrival (DOA) estimation scheme is proposed for sparse uniform linear arrays under low signal-to-noise ratios (SNRs) and non-stationary broadband signals. First, for achieving better DOA estimation performance at low SNRs while using non-stationary signals compared to the conventional frequency-difference (FD) paradigms, we propose parameterized time-frequency transform-based FD processing. Then, the unambiguous compressive FD beamforming is conceived to compensate the resolution loss induced by difference operation. Finally, we further derive a coarse-to-fine histogram statistics scheme to alleviate the perturbation in compressive FD beamforming with good DOA estimation accuracy. Simulation results demonstrate the superior performance of our proposed algorithm regarding robustness, resolution, and DOA estimation accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1211-1215"},"PeriodicalIF":3.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Licheng Zhao;Wenqiang Pu;Rui Zhou;Ming-Yi You;Qingjiang Shi
{"title":"Contextual Direct Position Determination for Path Loss Informed Localization","authors":"Licheng Zhao;Wenqiang Pu;Rui Zhou;Ming-Yi You;Qingjiang Shi","doi":"10.1109/LSP.2025.3550047","DOIUrl":"https://doi.org/10.1109/LSP.2025.3550047","url":null,"abstract":"In this letter, we look into the emitter localization task within the Direct Position Determination (DPD) paradigm. This paradigm is by essence a largest eigenvalue problem which treats the channel attenuation variables as free parameters. We consider the channel fading physical rule on electromagnetic signal propagation and reformulate the traditional DPD problem with a channel contextual prior. Thereafter, we develop iterative optimization algorithms based on the majorization-minimization (MM) framework. Numerical results show that the proposed algorithms outperform the traditional DPD estimators with better localization performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1241-1245"},"PeriodicalIF":3.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Alignment and Time Aggregation Network for Speech-Visual Emotion Recognition","authors":"Lile Wu;Lei Bai;Wenhao Cheng;Zutian Cheng;Guanghui Chen","doi":"10.1109/LSP.2025.3550007","DOIUrl":"https://doi.org/10.1109/LSP.2025.3550007","url":null,"abstract":"Video-based speech-visual emotion recognition plays a crucial role in human-computer interaction applications. However, it faces several challenges, including: 1) the redundancy in the extracted speech-visual features caused by the heterogeneity between speech and visual modalities, and 2) the ineffective modeling of the time-varying characteristics of emotions. To this end, this paper proposes an adaptive alignment and time aggregation network (AataNet). Specifically, AataNet designs a low redundancy speech-visual adaptive alignment (LRSVAA) module to acquire the low-redundant aligned features of speech-visual modalities. Meanwhile, AataNet also designs a computationally efficient time-adaptive aggregation (CETAA) module to model the time-varying characteristics of emotions. Experiments on RAVDESS, BAUM-1 s and eNTERFACE05 datasets also demonstrate that the proposed AataNet achieves better results.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1181-1185"},"PeriodicalIF":3.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Feature Focus Enhanced Network for Small and Dense Object Detection in SAR Images","authors":"Cong Li;Lihu Xi;Yongqiang Hei;Wentao Li;Zhu Xiao","doi":"10.1109/LSP.2025.3548934","DOIUrl":"https://doi.org/10.1109/LSP.2025.3548934","url":null,"abstract":"Deep learning has demonstrated its potential capability in object detection of synthetic aperture radar (SAR) images. However, the low detection accuracy for small and dense objects remains a critical issue. To address this issue, in this work, a feature focus enhanced YOLO (FFE-YOLO) architecture is proposed. In FFE-YOLO, a channel feature enhanced (CFE) module is introduced to extract richer information and reduce time consumption by integrating it into the backbone. Additionally, a feature selection fusion network (FSFN) is designed to enhance the feature representation of small and dense objects by fully utilizing channel information. Numerical results demonstrate that FFE-YOLO outperforms baseline by 3.12% and 3.06% on datasets HRSID and LS-SSDD-v1.0, respectively, but with less inference time. These results demonstrate the effectiveness and superiority of the proposed strategy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1306-1310"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143716530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DUAL-GDFQ: A Dual-Generator, Dual-Phase Learning Approach for Data-Free Quantization","authors":"Sihan Wang;Zhi Han;Xiyao Liu","doi":"10.1109/LSP.2025.3549025","DOIUrl":"https://doi.org/10.1109/LSP.2025.3549025","url":null,"abstract":"Data-free quantization (DFQ) seeks to maximize the performance of quantized networks without requiring original training data. Conventional methods, which use synthetic samples from generators for network fine-tuning, often yield inferior results compared to training conducted with real data. To mitigate this problem, we introduce a dual-generator, dual-phase learning generative data-free quantization (DUAL-GDFQ) method, which utilizes two generators: a knowledge-matching generator and a knowledge-promoting generator for replicating the original data distribution as well as keeping samples informative. Additionally, inspired by meta-learning, the proposed novel dual-phase learning scheme can effectively utilize the capabilities of both generators by aligning their gradient descent directions. Theoretical analysis and extensive experiments demonstrate that our method successfully minimizes performance degradation in quantized networks and can achieve performance levels comparable to training with real data.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1600-1604"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10916929","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143870995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongxu Jin;Baiyang Chen;Qianwen Lu;Qingchuan Tao;Yongxiang Li
{"title":"Learning Source-Free Domain Adaptation for Infrared Small Target Detection","authors":"Hongxu Jin;Baiyang Chen;Qianwen Lu;Qingchuan Tao;Yongxiang Li","doi":"10.1109/LSP.2025.3549000","DOIUrl":"https://doi.org/10.1109/LSP.2025.3549000","url":null,"abstract":"Existing infrared small target detection (IRSTD) methods mainly rely on the assumption that the training and testing data come from the same distribution, a premise that does not hold in many real-world scenarios. Additionally, the inability to access source domain data in numerous IRSTD tasks further complicates the domain adaptation process. To address these challenges, we propose a novel Source-Free Domain Adaptation (SFDA) framework for IRSTD, denoted as IRSTD-SFDA. This framework comprises two key components: Multi-expert Domain Adaptation (MDA) and Multi-scale Focused Learning (MFL). MDA leverages the source model to generate pseudo masks for the target domain, facilitating the transfer of knowledge from the source to the target domain. To account for the inherent diversity of small targets across domains, MDA refines these pseudo masks through a series of operations, including target localization, rolling guidance filtering, shape adaptation, and multi-expert decision, thereby mitigating morphological discrepancies between the source and target domains. Meanwhile, MFL employs a global-local fusion strategy to focus on critical regions, enhancing the model's ability to detect small infrared targets. Extensive experimental evaluations across various cross-domain scenarios demonstrate the effectiveness of the proposed framework.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1121-1125"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous Domain Remapping for Universal Detection of Generative Linguistic Steganography","authors":"Tong Xiao;Jingang Wang;Songbin Li","doi":"10.1109/LSP.2025.3549015","DOIUrl":"https://doi.org/10.1109/LSP.2025.3549015","url":null,"abstract":"Current researchers have proposed various steganalysis methods for detecting secret information within social media texts, which can achieve relatively optimal detection performance in specific steganographic domains. However, considering the practical application of social media, we can only obtain the text to be tested without prior knowledge of the steganographic domain it belongs to. Consequently, we are unable to prepare a supervised training dataset in advance. This places higher demands on steganalysis algorithms, necessitating their ability to generalize and detect any unknown steganography domain. To this end, we propose a universal detection method for generative linguistic steganography based on heterogeneous domain remapping. The core idea is to employ a neural structure composed of pre-trained embedding layers and capsule networks to extract steganography-sensitive correlation features. Subsequently, the concept of contrastive learning is utilized to remap the sensitive features from heterogeneous steganography domains into a unified domain. This process effectively extracts domain-invariant features, thereby enabling the detection of unknown steganographic domains. Experimental results demonstrate that the proposed method outperforms existing approaches by an average of over 2% across various steganography domains.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1281-1285"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143716365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuang Zhang;Lijun Zhang;Dejian Meng;Wei Tian;Jun Yan
{"title":"Spectral Scaling-Based Augmentation for Corruption-Robust Image Classification","authors":"Zhuang Zhang;Lijun Zhang;Dejian Meng;Wei Tian;Jun Yan","doi":"10.1109/LSP.2025.3549014","DOIUrl":"https://doi.org/10.1109/LSP.2025.3549014","url":null,"abstract":"Image classifiers often degrade in performance when test images differ significantly from the training distribution due to real-world image corruptions. Frequency-based augmentations can be used to address this issue, but existing methods excel against corruptions caused by noise and blur while struggling with those caused by contrast and fog. To tackle these challenges, we propose a novel image augmentation method grounded in a new perspective of relative spectral differences. This perspective characterizes spectral variations introduced by common corruptions as changes in non-zero frequencies, providing a unified understanding of their effects on image spectra. Building on this insight, the proposed method incorporates two key modules: a random spectral scaling module that captures statistical properties of image spectra and a deep spectral scaling module that adaptively learns spectral adjustments through a neural network. Experiments demonstrate that the proposed method improves overall robustness across various corruptions, with notable gains of 6.3% and 6.4% on contrast and fog, respectively, where existing methods often fall short.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1206-1210"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangjian Xie;Yingqi Wang;Xinyuan Qian;Junguo Zhang;Björn W. Schuller
{"title":"Improving Bird Vocalization Recognition in Open-Set Cross-Corpus Scenarios With Semantic Feature Reconstruction and Dual Strategy Scoring","authors":"Jiangjian Xie;Yingqi Wang;Xinyuan Qian;Junguo Zhang;Björn W. Schuller","doi":"10.1109/LSP.2025.3549008","DOIUrl":"https://doi.org/10.1109/LSP.2025.3549008","url":null,"abstract":"Automated recognition of bird vocalizations (BVs) is essential for biodiversity monitoring through passive acoustic monitoring (PAM), yet deep learning (DL) models encounter substantial challenges in open environments. These include difficulties in detecting unknown classes, extracting species-specific features, and achieving robust cross-corpus recognition. To address these challenges, this letter presents a DL-based open-set cross-corpus recognition method for BVs that combines feature construction with open-set recognition (OSR) techniques. We introduce a three-channel spectrogram that integrates both amplitude and phase information to enhance feature representation. To improve the recognition accuracy of known classes across corpora, we employ a class-specific semantic reconstruction model to extract deep features. For unknown class discrimination, we propose a Dual Strategy Coupling Scoring (DSCS) mechanism, which synthesizes the log-likelihood ratio score (LLRS) and reconstruction error score (RES). Our method achieves the highest weighted accuracy among existing approaches on a public dataset, demonstrating its effectiveness for open-set cross-corpus bird vocalization recognition.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1515-1519"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jialong Zhong;Tingwei Liu;Yongri Piao;Weibing Sun;Huchuan Lu
{"title":"ProSegDiff: Prostate Segmentation Diffusion Network Based on Adaptive Adjustment of Injection Features","authors":"Jialong Zhong;Tingwei Liu;Yongri Piao;Weibing Sun;Huchuan Lu","doi":"10.1109/LSP.2025.3548422","DOIUrl":"https://doi.org/10.1109/LSP.2025.3548422","url":null,"abstract":"Recently, methods based on Diffusion Probability Models (DPM) have achieved notable success in the field of medical image segmentation. However, most of these methods do not perform well in segmenting ambiguous areas when dealing with prostate segmentation tasks due to the low distinguishability of prostate images and the high overlap of its boundary with adjacent organs. To address this issue, this paper introduces a diffusion-based framework named ProSegDiff, ProSegDiff employs an Adapter to dynamically adjust features from the conditional network to align with the denoising process of the denoising network. Furthermore, the denoising process is conducted in the latent space to minimize the consumption of computational resources, and a proposed selection strategy is employed to identify the better results from multiple inferences. Extensive comparative experiments on four benchmark datasets demonstrate the effectiveness of this method, which achieves superior performance across four evaluation metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1236-1240"},"PeriodicalIF":3.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}