IEEE Signal Processing Letters最新文献_第10页

SVD-Guided Diffusion for Training-Free Low-Light Image Enhancement 用于无训练低光图像增强的奇异值引导扩散

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-11 DOI: 10.1109/LSP.2025.3597558

Jingi Kim;Wonjun Kim

{"title":"SVD-Guided Diffusion for Training-Free Low-Light Image Enhancement","authors":"Jingi Kim;Wonjun Kim","doi":"10.1109/LSP.2025.3597558","DOIUrl":"https://doi.org/10.1109/LSP.2025.3597558","url":null,"abstract":"Low-light image enhancement aims to improve the visibility and the contrast of images captured under poor lighting conditions while preserving contextual details. In this context, most previous methods have relied on the paired training data, which often leads to overfitting to specific data distributions. Although recent approaches have adopted generative priors of the diffusion model to avoid such learning bias, the stochastic nature of the diffusion model restricts the precise control over luminance-related features. To address these challenges, we propose a novel and training-free method that integrates the Singular Value Decomposition (SVD) with a pretrained diffusion model. Based on our observation that SVD tends to separate an image into luminance and structural components, we propose to leverage the decomposition capability of SVD and the generative prior of the diffusion model simultaneously. Specifically, our approach effectively guides the restoration process of lighting conditions by adaptively combining singular values of the intermediate result, which is obtained from each denoising step, with those of low-light input. For this combination, we define a semantic-aware scaling scheme based on a vision-language model. Experimental results on benchmark datasets demonstrate that the proposed method efficiently improves the performance of low-light image enhancement compared to other training-free methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3245-3249"},"PeriodicalIF":3.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OMR-Net+: A Frequency-Aware Feature Refinement and Entropy Modeling Method for Efficient Screen Content Image Compression OMR-Net+：一种有效的屏幕内容图像压缩的频率感知特征细化和熵建模方法

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3596872

Shiqi Jiang;Ting Ren;Hui Yuan;Junyan Huo;Xin Lu

{"title":"OMR-Net+: A Frequency-Aware Feature Refinement and Entropy Modeling Method for Efficient Screen Content Image Compression","authors":"Shiqi Jiang;Ting Ren;Hui Yuan;Junyan Huo;Xin Lu","doi":"10.1109/LSP.2025.3596872","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596872","url":null,"abstract":"Screen content image (SCI) compression faces challenges due to distinct characteristics such as sharp edges and repetitive structures. Existing learned image compression methods encounter two key issues: 1) insufficient frequency-aware processing, and 2) suboptimal entropy modeling for mixed-frequency components. To this end, we propose OMR-Net+, a novel SCI compression method that incorporates frequency-aware feature characteristics, including a frequency-aware refinement network (FARN) and a frequency-aware entropy model (FAEM). The proposed FARN uses an invertible neural network to preserve critical high-frequency details and a transformer-based model to reduce redundancy in low-frequency features. Additionally, the proposed FAEM provides tailored conditional probability estimation based on a parallel context model for high- and low-frequency features, respectively, to improve both coding performance and computational efficiency. Experimental results on the SCID and SIQAD datasets show that OMR-Net+ significantly outperforms the previous OMR-Net and other state-of-the-art methods in rate-distortion performance, demonstrating its potential for efficient SCI compression.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3290-3294"},"PeriodicalIF":3.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DCF-Net: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions DCF-Net：利用混合和登记交互的高效目标说话人提取

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3596846

Ke Xue;Rongfei Fan;Chang Sun;Puning Zhao;Jianping An

{"title":"DCF-Net: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions","authors":"Ke Xue;Rongfei Fan;Chang Sun;Puning Zhao;Jianping An","doi":"10.1109/LSP.2025.3596846","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596846","url":null,"abstract":"Target speaker extraction (TSE) aims to isolate a specific speaker’s voice from multi-talker environments using enrollment data. While current approaches primarily utilize speaker embeddings from enrollment, they often neglect contextual information and the dynamic interactions between the mixture and enrollment. To address this limitation, we propose a novel DualStream Contextual Fusion Network (DCF-Net) that operates in the time-frequency (T-F) domain. Our framework introduces a DualStream Fusion Block (DSFB) that: 1) captures contextual information, 2) models interactions between contextualized enrollment and mixture representations across spatial and channel dimensions, and 3) employs these enriched representations to guide the extraction process. Comprehensive experiments show that DCF-Net achieves state-of-the-art (SOTA) performance with a 21.6 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on benchmark datasets while demonstrating robustness in noisy and reverberant conditions. Notably, our model significantly reduces the wrong extraction rate to just 0.4% when testing on target confusion problem (TCP), underscoring its practical applicability.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3240-3244"},"PeriodicalIF":3.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved Specific Emitter Identification Based on Margin Disparity Discrepancy in Varying Modulation Scenarios 基于不同调制条件下余量视差的改进比射识别

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3597100

Yezhuo Zhang;Zinan Zhou;Yichao Cao;Guangyu Li;Xuanpeng Li

引用次数: 0

Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding 基于标记预测的低比特率语音编码后处理

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-07 DOI: 10.1109/LSP.2025.3596826

Fei Liu;Yang Ai;Zhen-Hua Ling

引用次数: 0

Shape-Selective Splatting: Regularizing the Shape of Gaussian for Sparse-View Rendering 形状选择飞溅：正则化高斯稀疏视图渲染的形状

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596225

Gun Ryu;Wonjun Kim

{"title":"Shape-Selective Splatting: Regularizing the Shape of Gaussian for Sparse-View Rendering","authors":"Gun Ryu;Wonjun Kim","doi":"10.1109/LSP.2025.3596225","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596225","url":null,"abstract":"In recent years, 3D Gaussian splatting (3DGS) has shown high-fidelity rendering results in real-time. However, 3DGS often encounters the overfitting problem under sparse-view conditions due to insufficient cross-view constraints. In this letter, to mitigate this limitation, we focus on the effect of Gaussian shapes on the scene reconstruction from sparse input views. The key idea is to allow each Gaussian to adaptively select its shape in accordance with the scene structure. Specifically, we propose to put a learnable parameter into Gaussian attributes, which indicates the probability of each shape. This indicator is optimized with other attributes while making each Gaussian change its shape to 1D, 2D, and 3D for representing edges, planar surfaces, and volumetric regions, respectively. Based on a geometrically accurate representation, the proposed method consequently alleviates the model from overfitting to a limited set of training views. Furthermore, we apply a depth regularization scheme within a set of selected pixels to precisely constrain positions of Gaussians. Experimental results on benchmark datasets show that the proposed method effectively improves the performance of novel view synthesis under sparse input views.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3172-3176"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144858661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GRM($m$): An Efficient Face Recognition Descriptor GRM($m$)：一种高效的人脸识别描述符

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596463

Chaorong Li;Libin Cui

{"title":"GRM($m$): An Efficient Face Recognition Descriptor","authors":"Chaorong Li;Libin Cui","doi":"10.1109/LSP.2025.3596463","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596463","url":null,"abstract":"This paper presents GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>), a Gabor wavelet-based face recognition descriptor tailored to tackle the challenges posed by small-sample conditions in computer vision tasks. Traditional deep learning models, such as ResNet and Transformer architectures, often struggle to generalize with sparse training data, particularly for near-frontal face images. To overcome this limitation, we propose a novel representation framework that leverages Gaussian Riemannian Manifolds (GRM) to capture both geometric structures and statistical dependencies of facial features. The GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>) descriptor encodes multi-scale local features into a Riemannian manifold space, enhancing the discriminative capability of face representations even with minimal samples. Combined with deep neural networks, GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>) efficiently fuses handcrafted geometric features with high-level semantic embeddings, significantly improving recognition accuracy. Extensive experiments on benchmark datasets demonstrate that GRM(<inline-formula><tex-math>$m$</tex-math></inline-formula>) outperforms state-of-the-art methods in few-shot learning scenarios, especially under challenging variations in expression, lighting, and accessories. The proposed approach provides a robust and scalable solution for real-world face recognition applications with constrained training samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3215-3219"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterization of OFDM-Based Secure Data Transmission Over Voice Channels 基于ofdm的语音信道安全数据传输特性研究

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596526

Zvezdana Kuzmanović;Sara Čubrilović;Marija Punt;Desimir Vučić;Branko Kovačević

{"title":"Characterization of OFDM-Based Secure Data Transmission Over Voice Channels","authors":"Zvezdana Kuzmanović;Sara Čubrilović;Marija Punt;Desimir Vučić;Branko Kovačević","doi":"10.1109/LSP.2025.3596526","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596526","url":null,"abstract":"This letter proposes a novel approach to speech-like signal design for reliable real-time transmission of encrypted data over voice channels. The structure of the digitally encrypted information-carrying segment relies on the known concept of the OFDM/QPSK modulation. The segment is extended with time-synchronization sequences, amplitude variations and silence intervals, all necessary for successful passing through realistic channels. Phase distortion has been identified as a significant source of error and inspired the proposal of a phase-shift compensation and a time-and-phase fine-tuning algorithms. Finally, this letter outlines a complete secure real-time communication system. Transmission quality of the resulting speech-like scheme is first evaluated on the AMR codec at various compression rates, followed by the analysis over realistic voice channels. The proposed secure DoV system achieves a mean BER of <inline-formula><tex-math>$2.45cdot 10^{-2}$</tex-math></inline-formula>, <inline-formula><tex-math>$1.11cdot 10^{-2}$</tex-math></inline-formula> and <inline-formula><tex-math>$4.83cdot 10^{-3}$</tex-math></inline-formula> over Signal, Telegram and 3G cellular network voice channels, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3230-3234"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semantic Hierarchy-Aware Hyperbolic Representations for Multi-Label Classification With Single Positive Labels 单正标签多标签分类的语义层次感知双曲表示

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596465

Tongtong Liu;Guoqiang Chen;Ying Wang;Wenhui Li

{"title":"Semantic Hierarchy-Aware Hyperbolic Representations for Multi-Label Classification With Single Positive Labels","authors":"Tongtong Liu;Guoqiang Chen;Ying Wang;Wenhui Li","doi":"10.1109/LSP.2025.3596465","DOIUrl":"https://doi.org/10.1109/LSP.2025.3596465","url":null,"abstract":"Single positive multi-label learning (SPML) aims to recognize multiple categories with limited supervision from one positive label in an image. With the emergence of pre-trained visual-language models such as CLIP, recent studies focused on capturing label-to-label dependencies. However, hierarchies with deeper layers of labels or more branches in label-to-label relationships cannot be well expressed in Euclidean space. To address the challenge, we introduce a semantic hierarchy-aware hyperbolic representations framework for single positive multi-label learning. Specifically, drawing inspiration from semantic hierarchical information, we introduce a label relation prior strategy to map single labels to other labels. The semantic chain of labels is extracted along the hierarchical path from the child node to the parent node. Furthermore, hyperbolic entailment constraints are adopted to enforce the semantic similarity between image-text pairs and the hierarchical consistency among labels in hyperbolic space. Experimental results conducted on four SPML benchmark datasets demonstrate that our SHHNet achieves state-of-the-art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3340-3344"},"PeriodicalIF":3.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Particle Swarm Optimization Enabled Parametric Mapping for Channel Model Substitution 粒子群优化实现通道模型替换的参数映射

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-06 DOI: 10.1109/LSP.2025.3596509

Zhongli Wang;Shuping Dang;Haiqiang Chen;Chengzhong Li

引用次数: 0