Mengjia Wang;Bo Ni;Jiacheng Zhang;Shengyang Luan;Tao Liu
{"title":"Improved Least Lncosh Based Fetal Electrocardiography Extraction in Alpha-Stable Noise","authors":"Mengjia Wang;Bo Ni;Jiacheng Zhang;Shengyang Luan;Tao Liu","doi":"10.1109/LSP.2025.3529626","DOIUrl":"https://doi.org/10.1109/LSP.2025.3529626","url":null,"abstract":"Fetal electrocardiography (FECG) presents an important avenue for continuous fetal monitoring. Nonetheless, effectively extracting FECG from maternal electrocardiogram (MECG) is one considerable challenge due to its weaker amplitude compared to MECG and the non-Gaussian nature of background noise. In this letter, we introduce alpha-stable noise to model the realistic interference due to its high scalability. To improve the accuracy of FECG extraction under impulsive noise (alpha-stable noise with strong impulses), we introduce the least Lncosh algorithm (Llncosh) and the improved Llncosh algorithm (ILL) is proposed based on the Amplitude Hyperbolic Tangent Transformation (AHTT) to optimize the preset parameter. Moreover, Monte Carlo experiments are carried out to investigate the capabilities of LMS-like algorithms and the ILL algorithm for FECG extraction. The results demonstrate that the ILL algorithm outperforms the LMS-like ones with carefully selected parameters, particularly showcasing superior robustness against impulsive noise. This work holds significance both in the theoretical research of adaptive filtering and in its practical application for FECG extraction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"676-680"},"PeriodicalIF":3.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minchan Kim;Myeonghun Jeong;Joun Yeop Lee;Nam Soo Kim
{"title":"SegINR: Segment-Wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech","authors":"Minchan Kim;Myeonghun Jeong;Joun Yeop Lee;Nam Soo Kim","doi":"10.1109/LSP.2025.3528858","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528858","url":null,"abstract":"We present SegINR, a novel approach to neural Text-to-Speech (TTS) that eliminates the need for either an auxiliary duration predictor or autoregressive (AR) sequence modeling for alignment. SegINR simplifies the TTS process by directly converting text sequences into frame-level features. Encoded text embeddings are transformed into segments of frame-level features with length regulation using a conditional implicit neural representation (INR). This method, termed Segment-wise INR (SegINR), captures temporal dynamics within each segment while autonomously defining segment boundaries, resulting in lower computational costs. Integrated into a two-stage TTS framework, SegINR is employed for semantic token prediction. Experiments in zero-shot adaptive TTS scenarios show that SegINR outperforms conventional methods in speech quality with computational efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"646-650"},"PeriodicalIF":3.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Perry Lam;Huayun Zhang;Nancy F. Chen;Berrak Sisman;Dorien Herremans
{"title":"PRESENT: Zero-Shot Text-to-Prosody Control","authors":"Perry Lam;Huayun Zhang;Nancy F. Chen;Berrak Sisman;Dorien Herremans","doi":"10.1109/LSP.2025.3528359","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528359","url":null,"abstract":"Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we present PRESENT (PRosody Editing without Style Embeddings or New Training), which exploits explicit prosody prediction in FastSpeech2-based models by modifying the inference process directly. We apply our text-to-prosody framework to zero-shot language transfer using a JETS model exclusively trained on English LJSpeech data. We obtain character error rates (CER) of 12.8%, 18.7% and 5.9% for German, Hungarian and Spanish respectively, beating the previous state-of-the-art CER by over 2× for all three languages. Furthermore, we allow subphoneme-level control, a first in this field. To evaluate its effectiveness, we show that PRESENT can improve the prosody of questions, and use it to generate Mandarin, a tonal language where vowel pitch varies at subphoneme level. We attain 25.3% hanzi CER and 13.0% pinyin CER with the JETS model.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"776-780"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenglizhao Chen;Chaoying Bai;Jia Song;Xu Yu;Shanchen Pang
{"title":"Omni-Directional View Person Re-Identification Through 3D Human Reconstruction","authors":"Chenglizhao Chen;Chaoying Bai;Jia Song;Xu Yu;Shanchen Pang","doi":"10.1109/LSP.2025.3529619","DOIUrl":"https://doi.org/10.1109/LSP.2025.3529619","url":null,"abstract":"Person re-identification (ReID) aims to identify the same individual across different cameras. Most existing researches focus on horizontal perspectives, where cameras and individuals are positioned at similar heights. However, in real-word applications, cameras are usually mounted at varying heights (e.g., either high-view or low-view) to achieve a broader field of view. Hence, some studies have explored high-view ReID, yet these rely heavily on manually annotating large datasets, which is extremely time-consuming and not publicly available. To improve, we propose a “controllable” data generation protocol that automatically generates omni-directional view data. This protocol can extend any common ReID dataset into an extensive omni-directional view one. By upgrading existing ReID SOTAs with the enhanced data, they can be made to handle ReID tasks with varying camera angles. B.t.w., to verify the effectiveness, we still need “real” data for testing. Thus, we constructed a small testing dataset containing diverse camera angles. Extensive quantitative results demonstrate that our solution is generic and can be applied to any SOTA ReID to achieve extensive performance promotions, e.g., 3% –12% improvement in mAP.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"796-800"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DPO: Discrete Prompt Optimization for Vision-Language Models","authors":"Nanhao Liang;Yong Liu","doi":"10.1109/LSP.2025.3528362","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528362","url":null,"abstract":"In recent years, the emergence of large vision-language models (VLMs) has catalyzed the development of prompt learning, where networks are trained to enhance VLM performance by learning continuous prompts. However, traditional continuous prompt learning often struggles with challenges like overfitting to Base classes and a lack of interpretability due to the nature of prompt parameterization. To overcome these limitations, we introduce Discrete Prompt Optimization (DPO), a method that optimizes text prompts in discrete word-space. During training, scores are assigned to token embeddings, which are then used to select the most effective token sequence for the downstream task. DPO was tested across 11 diverse datasets, consistently outperforming baseline methods like CLIP and CoOp on Novel classes in most cases. This discrete approach not only reduces overfitting but also enhances transparency and model interpretability, enabling the learning of dataset-specific text prompts that are easily understandable.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"671-675"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Speed and Range Estimation Using Dual Linear Frequency Modulated Signals","authors":"Teodoro Aguilera;Fernando J. Álvarez","doi":"10.1109/LSP.2025.3528859","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528859","url":null,"abstract":"This work establishes the theoretical foundations to optimally design Dual Linear Frequency Modulated (DLFM) signals intended to measure range and speed. Expressions are derived to obtain the optimal DLFM signal bandwidth and sampling frequency to perform accurate estimations with a minimum computational load, for a given maximum signal frequency and expected speed. In addition, a software simulator has been developed to validate the theoretical predictions by recreating the conditions of emission and reception of DLFM signals with different bandwidths, subject to the Doppler effect at different speeds. Simulations are performed for speeds between 1 and 5 m/s and signal bandwidths ranging from 100 Hz to 2000 Hz. These results show that the relative errors obtained for range and speed are below 2% for the optimal bandwidth and sampling frequency values predicted in this work.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"661-665"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839567","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Modal Hybrid Encoding Approach Based on Information Bottleneck for Brain Tumor Grading","authors":"Luyue Yu;Chengyuan Liu;Aixi Qu;Qiang Wu;Ju Liu","doi":"10.1109/LSP.2025.3528861","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528861","url":null,"abstract":"Grade classification of gliomas is critical in clinical diagnosis and treatment decisions. Although histological images are commonly used for grading and as an important factor in prognostic prediction, their results are prone to inter-observer variability. Recent advancements in molecular genetics have significantly improved tumor classification, but challenges persist in effective feature selection and multi-modal data fusion. This letter proposes a multi-modal hybrid encoding method based on information bottleneck (MHEIB), combining histological images and genetic data to enhance glioma grading. MHEIB effectively fuses multi-modal features through the information bottleneck module and the self-attention mechanism, which compresses and filters the key features and dynamically adjusts the weights of multi-modal features to improve the classification accuracy. Experimental results on The Cancer Genome Atlas (TCGA) glioma dataset demonstrate that MHEIB outperforms several fusion methods in terms of F1-score, AUC, and AP. In particular, MHEIB significantly improved the classification AUC to 89.3% and 83.7% for similar categories of Grades II and III respectively. Overall, the MHEIB method provides an efficient multi-modal data fusion solution for glioma grading.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"651-655"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explicit Bandwidth Learning for FOREX Trading Using Deep Reinforcement Learning","authors":"Angelos Nalmpantis;Nikolaos Passalis;Anastasios Tefas","doi":"10.1109/LSP.2025.3528365","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528365","url":null,"abstract":"Financial time series are sequences of price observations related to financial assets collected over time. Deep Learning (DL) is currently standing as the predominant approach for addressing various time series tasks, including problems in finance, such as the development of trading agents using Deep Reinforcement Learning (DRL). However, the noisy and temporal nature of such data as well as their non-stationarity pose substantial challenges to current methodologies. DL models suffer from overfitting noise, frequently arising from the absence of strong priors. In this paper, we address the instability of trading DRL agents due to noise by proposing an end-to-end hybrid trainable filtering and feature extraction approach. The proposed method employs Gaussian filters as priors and can be attached at the beginning of any DL architecture forming a hybrid model-based and data-driven model that can directly process the raw input data. The bandwidth of the filters is determined through the learning process, ultimately allowing the agent to autonomously determine the optimal bandwidth for the task and data at hand, without requiring any additional supervision. Moreover, the proposed method leverages high-order derivatives to address the non-stationarity of financial data and provides multiple views of the input signal efficiently utilized by the subsequent model. We conduct experiments with a plethora of financial assets from the Foreign Exchange Market (FOREX) and demonstrate the method's efficiency when compared to alternative processing pipelines.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"686-690"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind Embedding Rate Steganalysis Using Refocusing Learning","authors":"Shuyi Li;Xuanbo Zhang;Xinpeng Zhang;Guorui Feng","doi":"10.1109/LSP.2025.3528360","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528360","url":null,"abstract":"Existing steganalysis methods perform well under ideal conditions but encounter challenges in real-world scenarios with uncertain embedding rates. This paper proposes a novel steganalysis network based on refocusing learning to enhance detection accuracy for blind embedding rate contexts. The proposed network incorporates a detail gradient guided module (DGGM) to capture subtle spatial changes, which are integrated into multiple layers to ensure the model consistently focuses on these critical details. Additionally, a two-stage training strategy is employed. It is initially trained to obtain a pre-trained model, while the second stage optimizes the pre-trained convolutional kernels by refocusing learning. This approach enhances the feature extraction ability by indirectly strengthening connections between different channels. Experimental results demonstrate that the proposed method achieves strong detection performance across various spatial and JPEG domain steganographic algorithms with blind embedding rates, outperforming SRNet, EfficientNet-B4, and DATNet in detection accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"666-670"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Patterns for Convolutional Neural Networks in Regression Tasks to Make Specific Predictions via Genetic Algorithms","authors":"Yibiao Rong","doi":"10.1109/LSP.2025.3528363","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528363","url":null,"abstract":"Convolutional neural networks (CNNs) are effective tools for regression tasks. However, their black-box nature limits their applicability in high-impact and high-risk tasks. In this paper, a novel method is proposed to identify particular patterns in an image that can make the output of a CNN model equal to a specified value, thereby helping users understand the behaviours of CNNs. Specifically, in the proposed method, a set of binary filters is first randomly initialized. A genetic algorithm is then employed to evolve the binary filters such that the output of the CNN is equal to a specified value when taking a filtered image, which is obtained by convolving an original image and an evolved filter, as its input. Many experiments are conducted to evaluate the effectiveness of the proposed method. The results show that the proposed method is highly effective at identifying the patterns that can make a CNN output a specified value.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"626-630"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}