Rui Zhou;Gangyi Jiang;Linwei Zhu;Yueli Cui;Ting Luo
{"title":"Blind Light Field Image Quality Assessment via Frequency Domain Analysis and Auxiliary Learning","authors":"Rui Zhou;Gangyi Jiang;Linwei Zhu;Yueli Cui;Ting Luo","doi":"10.1109/LSP.2025.3531209","DOIUrl":"https://doi.org/10.1109/LSP.2025.3531209","url":null,"abstract":"Due to the distortions occurring at various stages from acquisition to visualization, light field image quality assessment (LFIQA) is crucial for guiding the processing of light field images (LFIs). In this letter, we propose a new blind LFIQA metric via frequency domain analysis and auxiliary learning, termed as FABLFQA. First, spatial-angular patches are extracted from LFIs and further processed through discrete cosine transform to obtain light field frequency maps. Subsequently, a concise and efficient frequency-aware deep learning network is designed to extract frequency features, including the frequency descriptor, 3D ConvBlock, and frequency transformer. Finally, a distortion type discrimination auxiliary task is employed to facilitate the learning of the main quality assessment task. Experimental results on three representative LFI datasets show that the proposed metric outperforms the state-of-the-art metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"711-715"},"PeriodicalIF":3.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Noise Representation Learning for Robust Speaker Recognition","authors":"Sunyoung Cho;Kyungchul Wee","doi":"10.1109/LSP.2025.3530879","DOIUrl":"https://doi.org/10.1109/LSP.2025.3530879","url":null,"abstract":"Speaker recognition in noisy environments remains a challenging issue due to highly variable noise, which hinders convergence to an optimal solution. To address the information discrepancies caused by noise variability during the training process, we explore a multi-modal learning scheme by treating different types of noise as distinct modalities. We propose a multi-noise representation learning method to extract embeddings that encode discriminative characteristics for each noise type, along with integrated commonalities from various types of noise. Specifically, the multi-noise learning network is jointly trained with an embedding extractor to continuously incorporate refined features under noisy conditions into the speaker embeddings. Experiments on VoxCeleb1 demonstrate that the proposed method is effective when used in conjunction with embedding extractors, outperforming state-of-the-art methods in noisy conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"681-685"},"PeriodicalIF":3.2,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight Efficient Rate-Adaptive Network for Compression-Aware Image Rescaling","authors":"Dingyi Li;Yang Zhang;Yu Liu","doi":"10.1109/LSP.2025.3530853","DOIUrl":"https://doi.org/10.1109/LSP.2025.3530853","url":null,"abstract":"Compression-aware image rescaling approaches convert high-resolution images to compressed low-resolution ones to fit various display devices or save bandwidth/storage. Inverse upscaling is successively performed to enlarge the low-resolution images to the original sizes with rich details. However, previous compression-aware image rescaling methods lack adaptivity to diverse compression rates, or require multiple large models with huge computational cost for adjusting. To overcome these challenges, we propose a lightweight efficient rate-adaptive network (LERAN) for compression-aware image rescaling. We design a non-invertible framework based on quality factor-driven feature modulation modules and an expandable training strategy, to achieve the adaptivity to various compression rates with only one light and efficient model. Moreover, alternative recursive blocks are presented for lighter weights with very small performance drop. During training, we also introduce a sparse low-resolution residual feature loss which promotes easier convergence of the model without adding further computational burden. Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art compression-aware image rescaling approaches for different compression rates on popular benchmarks, with an all-in-one lightweight model and much faster speed.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"691-695"},"PeriodicalIF":3.2,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DehazeGS: 3D Gaussian Splatting for Multi-Image Haze Removal","authors":"Chenjun Ma;Jieyu Zhao;Jian Chen","doi":"10.1109/LSP.2025.3530852","DOIUrl":"https://doi.org/10.1109/LSP.2025.3530852","url":null,"abstract":"Neural Radiance Fields (NeRF) have advanced 3D reconstruction by learning implicit representations of scenes from multi-view images, yet their effectiveness is limited in environments with scattering medium. Existing methods that incorporate scattering models into NeRF frameworks face issues with slow training speeds and high memory demands. This paper presents DehazeGS, a novel haze removal and reconstruction method based on 3D Gaussian Splatting (3DGS). Our approach integrates the Koschmieder scattering model into the 3DGS framework, enabling effective separation of objects and scattering medium. This method leverages a point-based representation to achieve high-quality scene reconstruction while significantly reducing computational and memory overhead. Experimental results on both synthetic and real datasets demonstrate that our method outperforms existing approaches in terms of dehazing quality and reconstruction performance, effectively synthesizing clear images from foggy scenes. Our findings suggest that integrating scattering models with 3DGS offers a promising solution for applications in adverse weather conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"736-740"},"PeriodicalIF":3.2,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengjia Wang;Bo Ni;Jiacheng Zhang;Shengyang Luan;Tao Liu
{"title":"Improved Least Lncosh Based Fetal Electrocardiography Extraction in Alpha-Stable Noise","authors":"Mengjia Wang;Bo Ni;Jiacheng Zhang;Shengyang Luan;Tao Liu","doi":"10.1109/LSP.2025.3529626","DOIUrl":"https://doi.org/10.1109/LSP.2025.3529626","url":null,"abstract":"Fetal electrocardiography (FECG) presents an important avenue for continuous fetal monitoring. Nonetheless, effectively extracting FECG from maternal electrocardiogram (MECG) is one considerable challenge due to its weaker amplitude compared to MECG and the non-Gaussian nature of background noise. In this letter, we introduce alpha-stable noise to model the realistic interference due to its high scalability. To improve the accuracy of FECG extraction under impulsive noise (alpha-stable noise with strong impulses), we introduce the least Lncosh algorithm (Llncosh) and the improved Llncosh algorithm (ILL) is proposed based on the Amplitude Hyperbolic Tangent Transformation (AHTT) to optimize the preset parameter. Moreover, Monte Carlo experiments are carried out to investigate the capabilities of LMS-like algorithms and the ILL algorithm for FECG extraction. The results demonstrate that the ILL algorithm outperforms the LMS-like ones with carefully selected parameters, particularly showcasing superior robustness against impulsive noise. This work holds significance both in the theoretical research of adaptive filtering and in its practical application for FECG extraction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"676-680"},"PeriodicalIF":3.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minchan Kim;Myeonghun Jeong;Joun Yeop Lee;Nam Soo Kim
{"title":"SegINR: Segment-Wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech","authors":"Minchan Kim;Myeonghun Jeong;Joun Yeop Lee;Nam Soo Kim","doi":"10.1109/LSP.2025.3528858","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528858","url":null,"abstract":"We present SegINR, a novel approach to neural Text-to-Speech (TTS) that eliminates the need for either an auxiliary duration predictor or autoregressive (AR) sequence modeling for alignment. SegINR simplifies the TTS process by directly converting text sequences into frame-level features. Encoded text embeddings are transformed into segments of frame-level features with length regulation using a conditional implicit neural representation (INR). This method, termed Segment-wise INR (SegINR), captures temporal dynamics within each segment while autonomously defining segment boundaries, resulting in lower computational costs. Integrated into a two-stage TTS framework, SegINR is employed for semantic token prediction. Experiments in zero-shot adaptive TTS scenarios show that SegINR outperforms conventional methods in speech quality with computational efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"646-650"},"PeriodicalIF":3.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DPO: Discrete Prompt Optimization for Vision-Language Models","authors":"Nanhao Liang;Yong Liu","doi":"10.1109/LSP.2025.3528362","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528362","url":null,"abstract":"In recent years, the emergence of large vision-language models (VLMs) has catalyzed the development of prompt learning, where networks are trained to enhance VLM performance by learning continuous prompts. However, traditional continuous prompt learning often struggles with challenges like overfitting to Base classes and a lack of interpretability due to the nature of prompt parameterization. To overcome these limitations, we introduce Discrete Prompt Optimization (DPO), a method that optimizes text prompts in discrete word-space. During training, scores are assigned to token embeddings, which are then used to select the most effective token sequence for the downstream task. DPO was tested across 11 diverse datasets, consistently outperforming baseline methods like CLIP and CoOp on Novel classes in most cases. This discrete approach not only reduces overfitting but also enhances transparency and model interpretability, enabling the learning of dataset-specific text prompts that are easily understandable.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"671-675"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Speed and Range Estimation Using Dual Linear Frequency Modulated Signals","authors":"Teodoro Aguilera;Fernando J. Álvarez","doi":"10.1109/LSP.2025.3528859","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528859","url":null,"abstract":"This work establishes the theoretical foundations to optimally design Dual Linear Frequency Modulated (DLFM) signals intended to measure range and speed. Expressions are derived to obtain the optimal DLFM signal bandwidth and sampling frequency to perform accurate estimations with a minimum computational load, for a given maximum signal frequency and expected speed. In addition, a software simulator has been developed to validate the theoretical predictions by recreating the conditions of emission and reception of DLFM signals with different bandwidths, subject to the Doppler effect at different speeds. Simulations are performed for speeds between 1 and 5 m/s and signal bandwidths ranging from 100 Hz to 2000 Hz. These results show that the relative errors obtained for range and speed are below 2% for the optimal bandwidth and sampling frequency values predicted in this work.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"661-665"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Modal Hybrid Encoding Approach Based on Information Bottleneck for Brain Tumor Grading","authors":"Luyue Yu;Chengyuan Liu;Aixi Qu;Qiang Wu;Ju Liu","doi":"10.1109/LSP.2025.3528861","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528861","url":null,"abstract":"Grade classification of gliomas is critical in clinical diagnosis and treatment decisions. Although histological images are commonly used for grading and as an important factor in prognostic prediction, their results are prone to inter-observer variability. Recent advancements in molecular genetics have significantly improved tumor classification, but challenges persist in effective feature selection and multi-modal data fusion. This letter proposes a multi-modal hybrid encoding method based on information bottleneck (MHEIB), combining histological images and genetic data to enhance glioma grading. MHEIB effectively fuses multi-modal features through the information bottleneck module and the self-attention mechanism, which compresses and filters the key features and dynamically adjusts the weights of multi-modal features to improve the classification accuracy. Experimental results on The Cancer Genome Atlas (TCGA) glioma dataset demonstrate that MHEIB outperforms several fusion methods in terms of F1-score, AUC, and AP. In particular, MHEIB significantly improved the classification AUC to 89.3% and 83.7% for similar categories of Grades II and III respectively. Overall, the MHEIB method provides an efficient multi-modal data fusion solution for glioma grading.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"651-655"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explicit Bandwidth Learning for FOREX Trading Using Deep Reinforcement Learning","authors":"Angelos Nalmpantis;Nikolaos Passalis;Anastasios Tefas","doi":"10.1109/LSP.2025.3528365","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528365","url":null,"abstract":"Financial time series are sequences of price observations related to financial assets collected over time. Deep Learning (DL) is currently standing as the predominant approach for addressing various time series tasks, including problems in finance, such as the development of trading agents using Deep Reinforcement Learning (DRL). However, the noisy and temporal nature of such data as well as their non-stationarity pose substantial challenges to current methodologies. DL models suffer from overfitting noise, frequently arising from the absence of strong priors. In this paper, we address the instability of trading DRL agents due to noise by proposing an end-to-end hybrid trainable filtering and feature extraction approach. The proposed method employs Gaussian filters as priors and can be attached at the beginning of any DL architecture forming a hybrid model-based and data-driven model that can directly process the raw input data. The bandwidth of the filters is determined through the learning process, ultimately allowing the agent to autonomously determine the optimal bandwidth for the task and data at hand, without requiring any additional supervision. Moreover, the proposed method leverages high-order derivatives to address the non-stationarity of financial data and provides multiple views of the input signal efficiently utilized by the subsequent model. We conduct experiments with a plethora of financial assets from the Foreign Exchange Market (FOREX) and demonstrate the method's efficiency when compared to alternative processing pipelines.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"686-690"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}