{"title":"Extending Guided Filters Through Effective Utilization of Multi-Channel Guide Images Based on Singular Value Decomposition","authors":"Kazu Mishiba","doi":"10.1109/OJSP.2025.3545304","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3545304","url":null,"abstract":"This paper proposes the SVD-based Guided Filter, designed to address key limitations of the original guided filter and its improved methods, providing better use of multi-channel guide images. First, we analyzed the guided filter framework, reinterpreting it from a patch-based perspective using singular value decomposition (SVD). This revealed that the original guided filter suppresses oscillatory components based on their eigenvalues. Building on this insight, we proposed a new filtering method that selectively suppresses or enhances these components through functions that respond to their eigenvalues. The proposed SVD-based Guided Filter offers improved control over edge preservation and noise reduction compared to the original guided filter and its improved methods, which often struggle to balance these tasks. We validated the proposed method across various image processing applications, including denoising, edge-preserving smoothing, detail enhancement, and edge-enhancing smoothing. The results demonstrated that the SVD-based Guided Filter consistently outperforms the original guided filter and its improved methods by making more effective use of color guide images. While the computational cost is slightly higher than the original guided filter, the method remains efficient and highly effective. Overall, the proposed SVD-based Guided Filter delivers notable improvements, offering a solid foundation for further advancements in guided filtering techniques.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"385-397"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10902178","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana C. Caznok Silveira;Diedre S. do Carmo;Lucas H. Ueda;Denis G. Fantinato;Paula D. P. Costa;Leticia Rittner
{"title":"VITMST++: Efficient Hyperspectral Reconstruction Through Vision Transformer-Based Spatial Compression","authors":"Ana C. Caznok Silveira;Diedre S. do Carmo;Lucas H. Ueda;Denis G. Fantinato;Paula D. P. Costa;Leticia Rittner","doi":"10.1109/OJSP.2025.3544891","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3544891","url":null,"abstract":"Hyperspectralchannel reconstruction transforms a subsampled multispectral image into hyperspectral imaging, providing higher spectral resolution without a dedicated acquisition hardware and camera. Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (MST++) is a state-of-the-art channel reconstruction technique, but it faces memory limitations for high spatial-resolution images. In this context, we introduced VITMST++, a novel architecture incorporating Vision Transformer embeddings for spatial compression, multi-resolution image context, and a custom channel-weighted loss. Developed for the ICASSP 2024 HyperSkin Challenge, VITMST++ outperforms the state-of-the-art MST++ in both performance and computational efficiency in channel reconstruction. In this work, we perform a deeper analysis on the main aspects of VITMST++ efficiency, quantitative performance, and generalization to other datasets. Results show that VITMST++ achieves similar values of SAM and SSIM hyperspectral reconstruction metrics when compared to state-of-the-art methods, while consuming up to three fold less memory and needing up to 10 times fewer multiply-add operations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"398-404"},"PeriodicalIF":2.9,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10900394","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143706583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Task Nuisance Filtration for Unsupervised Domain Adaptation","authors":"David Uliel;Raja Giryes","doi":"10.1109/OJSP.2025.3536850","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3536850","url":null,"abstract":"In unsupervised domain adaptation (UDA) labeled data is available for one domain (Source Domain) which is generated according to some distribution, and unlabeled data is available for a second domain (Target Domain) which is generated from a possibly different distribution but has the same task. The goal is to learn a model that performs well on the target domain although labels are available only for the source data. Many recent works attempt to align the source and the target domains by matching their marginal distributions in a learned feature space. In this paper, we address the domain difference as a nuisance, and enables better adaptability of the domains, by encouraging minimality of the target domain representation, disentanglement of the features, and a smoother feature space that cluster better the target data. To this end, we use the information bottleneck theory and a classical technique from the blind source separation framework, namely, ICA (independent components analysis). We show that these concepts can improve performance of leading domain adaptation methods on various domain adaptation benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"303-311"},"PeriodicalIF":2.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858365","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato
{"title":"Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers","authors":"Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato","doi":"10.1109/OJSP.2025.3536853","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3536853","url":null,"abstract":"Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"276-283"},"PeriodicalIF":2.9,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858379","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux
{"title":"SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers","authors":"Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3534686","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534686","url":null,"abstract":"We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our <underline>demo page</u>.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"266-275"},"PeriodicalIF":2.9,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10856829","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lies Bollens;Corentin Puffay;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart
{"title":"Auditory EEG Decoding Challenge for ICASSP 2024","authors":"Lies Bollens;Corentin Puffay;Bernd Accou;Jonas Vanthornhout;Hugo Van Hamme;Tom Francart","doi":"10.1109/OJSP.2025.3534122","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534122","url":null,"abstract":"This paper describes the auditory EEG challenge, organized as one of the Signal Processing Grand Challenges at ICASSP 2024. The challenge provides electroencephalogram (EEG) recordings of 105 subjects who listened to continuous speech, as audiobooks or podcasts, while their brain activity was recorded. The challenge consists of two tasks that relate EEG signals to the presented speech stimulus. The first task, called match-mismatch, is to determine which of five speech segments induced a given EEG segment. The second task, called regression, is to reconstruct the Mel spectrogram from the EEG. EEG recordings of 85 subjects were provided as a training set so that challenge participants could train their models on a relatively large dataset. The remaining 20 subjects were used as held-out subjects for the evaluation step of the challenge.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"478-488"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Learning of Expanding Graphs","authors":"Samuel Rey;Bishwadeep Das;Elvin Isufi","doi":"10.1109/OJSP.2025.3534692","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534692","url":null,"abstract":"This paper addresses the problem of online network topology inference for expanding graphs from a stream of spatiotemporal signals. Online algorithms for dynamic graph learning are crucial in delay-sensitive applications or when changes in topology occur rapidly. While existing works focus on inferring the connectivity within a fixed set of nodes, in practice, the graph can grow as new nodes join the network. This poses additional challenges like modeling temporal dynamics involving signals and graphs of different sizes. This growth also increases the computational complexity of the learning process, which may become prohibitive. To the best of our knowledge, this is the first work to tackle this setting. We propose a general online algorithm based on projected proximal gradient descent that accounts for the increasing graph size at each iteration. Recursively updating the sample covariance matrix is a key aspect of our approach. We introduce a strategy that enables different types of updates for nodes that just joined the network and for previously existing nodes. To provide further insights into the proposed method, we specialize it in Gaussian Markov random field settings, where we analyze the computational complexity and characterize the dynamic cumulative regret. Finally, we demonstrate the effectiveness of the proposed approach using both controlled experiments and real-world datasets from epidemic and financial networks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"247-255"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854617","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Gaussian Process Dynamical Models","authors":"Yaman Kındap;Simon Godsill","doi":"10.1109/OJSP.2025.3534690","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3534690","url":null,"abstract":"Probabilistic dynamical models used in applications in tracking and prediction are typically assumed to be Gaussian noise driven motions since well-known inference algorithms can be applied to these models. However, in many real world examples deviations from Gaussianity are expected to appear, e.g., rapid changes in speed or direction, which cannot be reflected using processes with a smooth mean response. In this work, we introduce the non-Gaussian process (NGP) dynamical model which allow for straightforward modelling of heavy-tailed, non-Gaussian behaviours while retaining a tractable conditional Gaussian process (GP) structure through an infinite mixture of non-homogeneous GPs representation. We present two novel inference methodologies for these new models based on the conditionally Gaussian formulation of NGPs which are suitable for both MCMC and marginalised particle filtering algorithms. The results are demonstrated on synthetically generated data sets.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"213-221"},"PeriodicalIF":2.9,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854574","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Moving Object Segmentation in LiDAR Point Clouds Using Minimal Number of Sweeps","authors":"Zoltan Rozsa;Akos Madaras;Tamas Sziranyi","doi":"10.1109/OJSP.2025.3532199","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3532199","url":null,"abstract":"LiDAR point clouds are a rich source of information for autonomous vehicles and ADAS systems. However, they can be challenging to segment for moving objects as - among other things - finding correspondences between sparse point clouds of consecutive frames is difficult. Traditional methods rely on a (global or local) map of the environment, which can be demanding to acquire and maintain in real-world conditions and the presence of the moving objects themselves. This paper proposes a novel approach using as minimal sweeps as possible to decrease the computational burden and achieve mapless moving object segmentation (MOS) in LiDAR point clouds. Our approach is based on a multimodal learning model with single-modal inference. The model is trained on a dataset of LiDAR point clouds and related camera images. The model learns to associate features from the two modalities, allowing it to predict dynamic objects even in the absence of a map and the camera modality. We propose semantic information usage for multi-frame instance segmentation in order to enhance performance measures. We evaluate our approach to the SemanticKITTI and Apollo real-world autonomous driving datasets. Our results show that our approach can achieve state-of-the-art performance on moving object segmentation and utilize only a few (even one) LiDAR frames.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"118-128"},"PeriodicalIF":2.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10848132","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning","authors":"Sathvik Udupa;Jesuraja Bandekar;Abhayjeet Singh;Deekshitha G;Saurabh Kumar;Sandhya Badiger;Amala Nagireddi;Roopa R;Prasanta Kumar Ghosh;Hema A. Murthy;Pranaw Kumar;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich","doi":"10.1109/OJSP.2025.3531782","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3531782","url":null,"abstract":"The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the multi-speaker, multi-lingual Text-to-Speech (TTS) model. Towards this, 80 hours of TTS data has been released in each of Bengali, Chhattisgarhi, English (Indian), and Kannada languages. This is in addition to Telugu, Hindi, and Marathi data released during the LIMMITS'23 challenge. The challenge encourages the advancement of TTS in Indian Languages as well as the development of multi-speaker voice cloning techniques for TTS. The three tracks of LIMMITS'24 have provided an opportunity for various researchers and practitioners around the world to explore the state of the art in research for voice cloning with TTS.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"293-302"},"PeriodicalIF":2.9,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845816","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}