IEEE Signal Processing Letters最新文献

筛选
英文 中文
SVD-Guided Diffusion for Training-Free Low-Light Image Enhancement 用于无训练低光图像增强的奇异值引导扩散
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-11 DOI: 10.1109/LSP.2025.3597558
Jingi Kim;Wonjun Kim
{"title":"SVD-Guided Diffusion for Training-Free Low-Light Image Enhancement","authors":"Jingi Kim;Wonjun Kim","doi":"10.1109/LSP.2025.3597558","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597558","url":null,"abstract":"Low-light image enhancement aims to improve the visibility and the contrast of images captured under poor lighting conditions while preserving contextual details. In this context, most previous methods have relied on the paired training data, which often leads to overfitting to specific data distributions. Although recent approaches have adopted generative priors of the diffusion model to avoid such learning bias, the stochastic nature of the diffusion model restricts the precise control over luminance-related features. To address these challenges, we propose a novel and training-free method that integrates the Singular Value Decomposition (SVD) with a pretrained diffusion model. Based on our observation that SVD tends to separate an image into luminance and structural components, we propose to leverage the decomposition capability of SVD and the generative prior of the diffusion model simultaneously. Specifically, our approach effectively guides the restoration process of lighting conditions by adaptively combining singular values of the intermediate result, which is obtained from each denoising step, with those of low-light input. For this combination, we define a semantic-aware scaling scheme based on a vision-language model. Experimental results on benchmark datasets demonstrate that the proposed method efficiently improves the performance of low-light image enhancement compared to other training-free methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3245-3249"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OMR-Net+: A Frequency-Aware Feature Refinement and Entropy Modeling Method for Efficient Screen Content Image Compression OMR-Net+:一种有效的屏幕内容图像压缩的频率感知特征细化和熵建模方法
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3596872
Shiqi Jiang;Ting Ren;Hui Yuan;Junyan Huo;Xin Lu
{"title":"OMR-Net+: A Frequency-Aware Feature Refinement and Entropy Modeling Method for Efficient Screen Content Image Compression","authors":"Shiqi Jiang;Ting Ren;Hui Yuan;Junyan Huo;Xin Lu","doi":"10.1109/LSP.2025.3596872","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596872","url":null,"abstract":"Screen content image (SCI) compression faces challenges due to distinct characteristics such as sharp edges and repetitive structures. Existing learned image compression methods encounter two key issues: 1) insufficient frequency-aware processing, and 2) suboptimal entropy modeling for mixed-frequency components. To this end, we propose OMR-Net+, a novel SCI compression method that incorporates frequency-aware feature characteristics, including a frequency-aware refinement network (FARN) and a frequency-aware entropy model (FAEM). The proposed FARN uses an invertible neural network to preserve critical high-frequency details and a transformer-based model to reduce redundancy in low-frequency features. Additionally, the proposed FAEM provides tailored conditional probability estimation based on a parallel context model for high- and low-frequency features, respectively, to improve both coding performance and computational efficiency. Experimental results on the SCID and SIQAD datasets show that OMR-Net+ significantly outperforms the previous OMR-Net and other state-of-the-art methods in rate-distortion performance, demonstrating its potential for efficient SCI compression.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3290-3294"},"PeriodicalIF":3.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCF-Net: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions DCF-Net:利用混合和登记交互的高效目标说话人提取
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3596846
Ke Xue;Rongfei Fan;Chang Sun;Puning Zhao;Jianping An
{"title":"DCF-Net: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions","authors":"Ke Xue;Rongfei Fan;Chang Sun;Puning Zhao;Jianping An","doi":"10.1109/LSP.2025.3596846","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596846","url":null,"abstract":"Target speaker extraction (TSE) aims to isolate a specific speaker’s voice from multi-talker environments using enrollment data. While current approaches primarily utilize speaker embeddings from enrollment, they often neglect contextual information and the dynamic interactions between the mixture and enrollment. To address this limitation, we propose a novel DualStream Contextual Fusion Network (DCF-Net) that operates in the time-frequency (T-F) domain. Our framework introduces a DualStream Fusion Block (DSFB) that: 1) captures contextual information, 2) models interactions between contextualized enrollment and mixture representations across spatial and channel dimensions, and 3) employs these enriched representations to guide the extraction process. Comprehensive experiments show that DCF-Net achieves state-of-the-art (SOTA) performance with a 21.6 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on benchmark datasets while demonstrating robustness in noisy and reverberant conditions. Notably, our model significantly reduces the wrong extraction rate to just 0.4% when testing on target confusion problem (TCP), underscoring its practical applicability.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3240-3244"},"PeriodicalIF":3.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Specific Emitter Identification Based on Margin Disparity Discrepancy in Varying Modulation Scenarios 基于不同调制条件下余量视差的改进比射识别
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3597100
Yezhuo Zhang;Zinan Zhou;Yichao Cao;Guangyu Li;Xuanpeng Li
{"title":"Improved Specific Emitter Identification Based on Margin Disparity Discrepancy in Varying Modulation Scenarios","authors":"Yezhuo Zhang;Zinan Zhou;Yichao Cao;Guangyu Li;Xuanpeng Li","doi":"10.1109/LSP.2025.3597100","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597100","url":null,"abstract":"In Specific Emitter Identification (SEI), transmitters are typically distinguished through Radio Frequency Fingerprint (RFF) features. However, modulation schemes can be deliberately coupled to confound RFF information. This paper addresses modulation variation as a Domain Adaptation (DA) problem and proposes an SEI framework based on Margin Disparity Discrepancy (MDD) to enhance robustness in modulation-varying scenarios. Specifically, we first establish a theoretical tight upper bound for the discrepancy between modulation domains using MDD theory. Then, we design an adversarial network to align variable features to shorten the discrepancy between modulations. Finally, we experimented with complex modulated signals including digital and analog modulation. Numerical results indicate that our approach achieves an average improvement of over 20% in accuracy compared to classical SEI methods and outperforms traditional DA techniques.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3375-3379"},"PeriodicalIF":3.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding 基于标记预测的低比特率语音编码后处理
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3596826
Fei Liu;Yang Ai;Zhen-Hua Ling
{"title":"Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding","authors":"Fei Liu;Yang Ai;Zhen-Hua Ling","doi":"10.1109/LSP.2025.3596826","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596826","url":null,"abstract":"Low-bitrate speech coding plays an essential role in speech transmission and storage. However, speech quality degrades noticeably at low bitrates with current coding methods. Therefore, this letter proposes a novel Token-Prediction-based Post-Processing (T3P) model to improve the quality of low-bitrate coded speech. Unlike existing post-processing methods, T3P is a discrete-domain method centered on the prediction and classification of discrete tokens. Specifically, given low-bitrate coded speech features as condition, T3P initiates from a random token and sequentially predicts the token sequences produced by a residual vector quantization (RVQ) based neural codec, which is subsequently decoded to reconstruct the raw speech. Experiments confirm that T3P surpasses flow-matching-based and speech-enhancement-based baselines, achieving a better trade-off between speech quality and efficiency. Empowered by T3P, Encodec achieves performance at just 0.5 kbps that exceeds its original 4 kbps results for 16 kHz speech coding.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3235-3239"},"PeriodicalIF":3.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shape-Selective Splatting: Regularizing the Shape of Gaussian for Sparse-View Rendering 形状选择飞溅:正则化高斯稀疏视图渲染的形状
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596225
Gun Ryu;Wonjun Kim
{"title":"Shape-Selective Splatting: Regularizing the Shape of Gaussian for Sparse-View Rendering","authors":"Gun Ryu;Wonjun Kim","doi":"10.1109/LSP.2025.3596225","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596225","url":null,"abstract":"In recent years, 3D Gaussian splatting (3DGS) has shown high-fidelity rendering results in real-time. However, 3DGS often encounters the overfitting problem under sparse-view conditions due to insufficient cross-view constraints. In this letter, to mitigate this limitation, we focus on the effect of Gaussian shapes on the scene reconstruction from sparse input views. The key idea is to allow each Gaussian to adaptively select its shape in accordance with the scene structure. Specifically, we propose to put a learnable parameter into Gaussian attributes, which indicates the probability of each shape. This indicator is optimized with other attributes while making each Gaussian change its shape to 1D, 2D, and 3D for representing edges, planar surfaces, and volumetric regions, respectively. Based on a geometrically accurate representation, the proposed method consequently alleviates the model from overfitting to a limited set of training views. Furthermore, we apply a depth regularization scheme within a set of selected pixels to precisely constrain positions of Gaussians. Experimental results on benchmark datasets show that the proposed method effectively improves the performance of novel view synthesis under sparse input views.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3172-3176"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144858661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GRM($m$): An Efficient Face Recognition Descriptor GRM($m$):一种高效的人脸识别描述符
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596463
Chaorong Li;Libin Cui
{"title":"GRM($m$): An Efficient Face Recognition Descriptor","authors":"Chaorong Li;Libin Cui","doi":"10.1109/LSP.2025.3596463","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596463","url":null,"abstract":"This paper presents GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>), a Gabor wavelet-based face recognition descriptor tailored to tackle the challenges posed by small-sample conditions in computer vision tasks. Traditional deep learning models, such as ResNet and Transformer architectures, often struggle to generalize with sparse training data, particularly for near-frontal face images. To overcome this limitation, we propose a novel representation framework that leverages Gaussian Riemannian Manifolds (GRM) to capture both geometric structures and statistical dependencies of facial features. The GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>) descriptor encodes multi-scale local features into a Riemannian manifold space, enhancing the discriminative capability of face representations even with minimal samples. Combined with deep neural networks, GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>) efficiently fuses handcrafted geometric features with high-level semantic embeddings, significantly improving recognition accuracy. Extensive experiments on benchmark datasets demonstrate that GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>) outperforms state-of-the-art methods in few-shot learning scenarios, especially under challenging variations in expression, lighting, and accessories. The proposed approach provides a robust and scalable solution for real-world face recognition applications with constrained training samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3215-3219"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterization of OFDM-Based Secure Data Transmission Over Voice Channels 基于ofdm的语音信道安全数据传输特性研究
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596526
Zvezdana Kuzmanović;Sara Čubrilović;Marija Punt;Desimir Vučić;Branko Kovačević
{"title":"Characterization of OFDM-Based Secure Data Transmission Over Voice Channels","authors":"Zvezdana Kuzmanović;Sara Čubrilović;Marija Punt;Desimir Vučić;Branko Kovačević","doi":"10.1109/LSP.2025.3596526","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596526","url":null,"abstract":"This letter proposes a novel approach to speech-like signal design for reliable real-time transmission of encrypted data over voice channels. The structure of the digitally encrypted information-carrying segment relies on the known concept of the OFDM/QPSK modulation. The segment is extended with time-synchronization sequences, amplitude variations and silence intervals, all necessary for successful passing through realistic channels. Phase distortion has been identified as a significant source of error and inspired the proposal of a phase-shift compensation and a time-and-phase fine-tuning algorithms. Finally, this letter outlines a complete secure real-time communication system. Transmission quality of the resulting speech-like scheme is first evaluated on the AMR codec at various compression rates, followed by the analysis over realistic voice channels. The proposed secure DoV system achieves a mean BER of <inline-formula><tex-math>$2.45cdot 10^{-2}$</tex-math></inline-formula>, <inline-formula><tex-math>$1.11cdot 10^{-2}$</tex-math></inline-formula> and <inline-formula><tex-math>$4.83cdot 10^{-3}$</tex-math></inline-formula> over Signal, Telegram and 3G cellular network voice channels, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3230-3234"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic Hierarchy-Aware Hyperbolic Representations for Multi-Label Classification With Single Positive Labels 单正标签多标签分类的语义层次感知双曲表示
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596465
Tongtong Liu;Guoqiang Chen;Ying Wang;Wenhui Li
{"title":"Semantic Hierarchy-Aware Hyperbolic Representations for Multi-Label Classification With Single Positive Labels","authors":"Tongtong Liu;Guoqiang Chen;Ying Wang;Wenhui Li","doi":"10.1109/LSP.2025.3596465","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596465","url":null,"abstract":"Single positive multi-label learning (SPML) aims to recognize multiple categories with limited supervision from one positive label in an image. With the emergence of pre-trained visual-language models such as CLIP, recent studies focused on capturing label-to-label dependencies. However, hierarchies with deeper layers of labels or more branches in label-to-label relationships cannot be well expressed in Euclidean space. To address the challenge, we introduce a semantic hierarchy-aware hyperbolic representations framework for single positive multi-label learning. Specifically, drawing inspiration from semantic hierarchical information, we introduce a label relation prior strategy to map single labels to other labels. The semantic chain of labels is extracted along the hierarchical path from the child node to the parent node. Furthermore, hyperbolic entailment constraints are adopted to enforce the semantic similarity between image-text pairs and the hierarchical consistency among labels in hyperbolic space. Experimental results conducted on four SPML benchmark datasets demonstrate that our SHHNet achieves state-of-the-art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3340-3344"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Particle Swarm Optimization Enabled Parametric Mapping for Channel Model Substitution 粒子群优化实现通道模型替换的参数映射
IF 3.9 2区 工程技术
IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596509
Zhongli Wang;Shuping Dang;Haiqiang Chen;Chengzhong Li
{"title":"Particle Swarm Optimization Enabled Parametric Mapping for Channel Model Substitution","authors":"Zhongli Wang;Shuping Dang;Haiqiang Chen;Chengzhong Li","doi":"10.1109/LSP.2025.3596509","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596509","url":null,"abstract":"Channel model substitution (CMS) is a technique that aims to replace a computationally challenging channel model with a simpler substitute. This technique is powerful for rapid adaptive signal processing and closed-form performance analytics. The parametric mapping between an original channel model and its substitute determines the utility of CMS. In the past decades, the moment matching criterion has dominated for conducting parametric mapping, which, however, is heuristic and has been proven non-optimal. In this letter, we propose to utilize particle swarm optimization (PSO) to obtain optimal parametric mapping relations for a general CMS problem, regardless of the distributional forms of the original channel model and the substitute. Taking the CMS techniques for the lognormal shadowed channel model as examples, simulation results show that the PSO enabled parametric mapping approach is capable of converging to the global optima under diverse system configurations, making CMS computationally feasible.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3220-3224"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信