{"title":"Cross-Dataset Head-Related Transfer Function Harmonization Based on Perceptually Relevant Loss Function","authors":"Jiale Zhao;Dingding Yao;Junfeng Li","doi":"10.1109/OJSP.2025.3590248","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3590248","url":null,"abstract":"Head-Related Transfer Functions (HRTFs) play a vital role in binaural spatial audio rendering. With the release of numerous HRTF datasets in recent years, abundant data has become available to support HRTF-related research based on deep learning. However, measurement discrepancies across different datasets introduce significant variations in the data and directly merging these datasets may lead to systematic biases. The recent Listener Acoustic Personalization Challenge 2024 (European Signal Processing Conference) dealt with this issue, with the task of harmonizing different datasets to achieve lower classification accuracy while meeting thresholds over various localization metrics. To mitigate cross-dataset differences, this paper proposes a neural network-based HRTF harmonization approach aimed at eliminating dataset-specific properties embedded in the original measurements. The proposed method utilizes a perceptually relevant loss function, which jointly constrains multiple objectives, including interaural level differences, auditory-filter excitation patterns, and classification accuracy. Experimental results based on eight datasets demonstrate that the proposed approach can effectively minimize distributional disparities between datasets while mostly preserving localization performance. The classification accuracy for harmonized HRTFs between different datasets is reduced to as low as 31%, indicating a significant reduction in cross-dataset discrepancies. The proposed method ranked first in this challenge, which validates its effectiveness.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"865-875"},"PeriodicalIF":2.7,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11082560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144739816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tell Me What You See: Text-Guided Real-World Image Denoising","authors":"Erez Yosef;Raja Giryes","doi":"10.1109/OJSP.2025.3588715","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3588715","url":null,"abstract":"Image reconstruction from noisy sensor measurements is challenging and many methods have been proposed for it. Yet, most approaches focus on learning robust natural image priors while modeling the scene’s noise statistics. In extremely low-light conditions, these methods often remain insufficient. Additional information is needed, such as multiple captures or, as suggested here, scene description. As an alternative, we propose using a text-based description of the scene as an additional prior, something the photographer can easily provide. Inspired by the remarkable success of text-guided diffusion models in image generation, we show that adding image caption information significantly improves image denoising and reconstruction for both synthetic and real-world images. All code and data will be made publicly available upon publication.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"890-899"},"PeriodicalIF":2.7,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078899","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Graph Structures With Autoregressive Graph Signal Models","authors":"Kyle Donoghue;Ashkan Ashrafi","doi":"10.1109/OJSP.2025.3588447","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3588447","url":null,"abstract":"This paper presents a novel approach to graph learning, GL-AR, which leverages estimated autoregressive coefficients to recover undirected graph structures from time-series graph signals with propagation delay. GL-AR can discern graph structures where propagation between vertices is delayed, mirroring the dynamics of many real-world systems. This is achieved by utilizing the autoregressive coefficients of time-series graph signals in GL-AR’s learning algorithm. Existing graph learning techniques typically minimize the smoothness of a graph signal on a recovered graph structure to learn instantaneous relationships. GL-AR extends this approach by showing that minimizing smoothness with autoregressive coefficients can additionally recover relationships with propagation delay. The efficacy of GL-AR is demonstrated through applications to both synthetic and real-world datasets. Specifically, this work introduces the Graph-Tensor Method, a novel technique for generating synthetic time-series graph signals that represent edges as transfer functions. This method, along with real-world data from the National Climatic Data Center, is used to evaluate GL-AR’s performance in recovering undirected graph structures. Results indicate that GL-AR’s use of autoregressive coefficients enables it to outperform state-of-the-art graph learning techniques in scenarios with nonzero propagation delays. Furthermore, GL-AR’s performance is optimized by a new automated parameter selection algorithm, which eliminates the need for computationally intensive trial-and-error methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"838-855"},"PeriodicalIF":2.7,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11078159","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144725118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Factor Graph Approach to Variational Sparse Gaussian Processes","authors":"Hoang Minh Huu Nguyen;İsmaıl Şenöz;Bert De Vries","doi":"10.1109/OJSP.2025.3585440","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3585440","url":null,"abstract":"A Variational Sparse Gaussian Process (VSGP) is a sophisticated nonparametric probabilistic model that has gained significant popularity since its inception. The VSGP model is often employed as a component of larger models or in a modified form across numerous applications. However, re-deriving the update equations for inference in these variations is technically challenging, which hinders broader adoption. In a separate line of research, message passing-based inference in factor graphs has emerged as an efficient framework for automated Bayesian inference. Despite its advantages, message passing techniques have not yet been applied to VSGP-based models due to the lack of a suitable representation for VSGP models in factor graphs. To address this limitation, we introduce a Sparse Gaussian Process (SGP) node within a Forney-style factor graph (FFG). We derive variational message passing update rules for the SGP node, enabling automated and efficient inference for VSGP-based models. We validate the update rules and illustrate the benefits of the SGP node through experiments in various Gaussian Process applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"815-837"},"PeriodicalIF":2.9,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11063321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model Predictive Control Algorithm for Video Coding and Uplink Delivery in Delay-Critical Applications","authors":"Mourad Aklouf;Frédéric Dufaux;Michel Kieffer;Marc Lény","doi":"10.1109/OJSP.2025.3584672","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3584672","url":null,"abstract":"Emerging applications such as remote car driving, drone control, or distant mobile robot operation impose a very tight constraint on the delay between the acquisition of a video frame by a camera embedded in the operated device and its display at the remote controller. This paper introduces a new frame-level video encoder rate control technique for ultra-low-latency video coding and delivery. A Model Predictive Control approach, exploiting the buffer level at the transmitter and an estimate of the transmission rate, is used to determine the target encoding rate of each video frame to adapt with minimum delay to sudden variations of the transmission channel characteristics. Then, an <inline-formula><tex-math>$R-(QP,D)$</tex-math></inline-formula> model of the rate <inline-formula><tex-math>$R$</tex-math></inline-formula> of the current frame to be encoded as a function of its quantization parameter (QP) and of the distortion <inline-formula><tex-math>$D$</tex-math></inline-formula> of the reference frame is used to get the QP matching the target rate. This QP is then fed to the video coder. The proposed approach is compared to reference algorithms, namely PANDA, FESTIVE, BBA, and BOLA, some of which have been adapted to the considered server-driven low-latency coding and transmission scenario. Simulation results based on 4G bandwidth traces show that the proposed algorithm outperforms the others at different glass-to-glass delay constraints, considering several video quality metrics.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"876-889"},"PeriodicalIF":2.7,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059858","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144750801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Cold Diffusion for the Decomposition of Identically Distributed Superimposed Images","authors":"Helena Montenegro;Jaime S. Cardoso","doi":"10.1109/OJSP.2025.3583963","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3583963","url":null,"abstract":"With the growing adoption of Deep Learning for imaging tasks in biometrics and healthcare, it becomes increasingly important to ensure privacy when using and sharing images of people. Several works enable privacy-preserving image sharing by anonymizing the images so that the corresponding individuals are no longer recognizable. Most works average images or their embeddings as an anonymization technique, relying on the assumption that the average operation is irreversible. Recently, cold diffusion models, based on the popular denoising diffusion probabilistic models, have succeeded in reversing deterministic transformations on images. In this work, we leverage cold diffusion to decompose superimposed images, empirically demonstrating that it is possible to obtain two or more identically-distributed images given their average. We propose novel sampling strategies for this task and show their efficacy on three datasets. Our findings highlight the risks of averaging images as an anonymization technique and argue for the use of alternative anonymization strategies.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"784-794"},"PeriodicalIF":2.9,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11054277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144606192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tiny-VPS: Tiny Video Panoptic Segmentation Standing on the Shoulder of Giant-VPS","authors":"Qingfeng Liu;Mostafa El-Khamy;Kee-Bong Song","doi":"10.1109/OJSP.2025.3581840","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3581840","url":null,"abstract":"Video Panoptic Segmentation (VPS) is the most challenging video segmentation task, as it requires accurate labeling of every pixel in each frame, as well as identifying the multiple instances and tracking them across frames. In this paper, we explore state-of-the-art solutions for VPS at both the giant model regime for offline or server processing and the tiny model regime for online or edge computing. We designed Giant-VPS which achieved the first place solution in the 2024 Pixel Level Video Understanding in the Wild (PVUW) challenge. Our Giant-VPS builds on top of MinVIS and deploys the DINOv2-giant vision foundation model with a carefully designed ViT (Vision Transformer) adapter. For mobile and edge devices, we designed the Tiny-VPS model and show that our novel ViT-adapter distillation from the Giant-VPS model can further improve the accuracy of Tiny-VPS. Our Tiny-VPS is the first, in the sub-20 GFLOPS regime, to achieve competitive accuracy on VPS and VSS (Video Semantic Segmentation) benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"803-814"},"PeriodicalIF":2.9,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11045393","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144623936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An D. Le;Shiwei Jin;Sungbal Seo;You-Suk Bae;Truong Q. Nguyen
{"title":"Biorthogonal Lattice Tunable Wavelet Units and Their Implementation in Convolutional Neural Networks for Computer Vision Problems","authors":"An D. Le;Shiwei Jin;Sungbal Seo;You-Suk Bae;Truong Q. Nguyen","doi":"10.1109/OJSP.2025.3580967","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3580967","url":null,"abstract":"This work introduces a universal wavelet unit constructed with a biorthogonal lattice structure which is a novel tunable wavelet unit to enhance image classification and anomaly detection in convolutional neural networks by reducing information loss during pooling. The unit employs a biorthogonal lattice structure to modify convolution, pooling, and down-sampling operations. Implemented in residual neural networks with 18 layers, it improved detection accuracy on CIFAR10 (by 2.67% ), ImageNet1K (by 1.85% ), and the Describable Textures dataset (by 11.81% ), showcasing its advantages in detecting detailed features. Similar gains are achieved in the implementations for residual neural networks with 34 layers and 50 layers. For anomaly detection on the MVTec Anomaly Detection and TUKPCB datasets, the proposed method achieved a competitive performance and better anomaly localization.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"768-783"},"PeriodicalIF":2.9,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11039659","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In-Scene Calibration of Poisson Noise Parameters for Phase Image Recovery","authors":"Achour Idoughi;Sreelakshmi Sreeharan;Chen Zhang;Joseph Raffoul;Hui Wang;Keigo Hirakawa","doi":"10.1109/OJSP.2025.3579650","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3579650","url":null,"abstract":"In sensor metrology, noise parameters governing the stochastic nature of photon detectors play critical role in characterizing the aleatoric uncertainty of computational imaging systems such as indirect time-of-flight cameras, structured light imaging, and division-of-time polarimetric imaging. Standard calibration procedures exists for extracting the noise parameters using calibration targets, but they are inconvenient or impractical for frequent updates. To keep up with noise parameters that are dynamically affected by sensor settings (e.g. exposure and gain) as well as environmental factors (e.g. temperature), we propose an In-Scene Calibration of Poisson Noise Parameters (ISC-PNP) method that does not require calibration targets. The main challenge lies in the heteroskedastic nature of the noise and the confounding influence of scene content. To address this, our method leverages global joint statistics of Poisson sensor data, which can be interpreted as a binomial random variable. We experimentally confirm that the noise parameters extracted by the proposed ISC-PNP and the standard calibration procedure are well-matched.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"682-690"},"PeriodicalIF":2.9,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11034763","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continuous Relaxation of Discontinuous Shrinkage Operator: Proximal Inclusion and Conversion","authors":"Masahiro Yukawa","doi":"10.1109/OJSP.2025.3579646","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3579646","url":null,"abstract":"We present a principled way of deriving a continuous relaxation of a given discontinuous shrinkage operator, which is based on two fundamental results, proximal inclusion and conversion. Using our results, the discontinuous operator is converted, via double inversion, to a continuous operator; more precisely, the associated “set-valued” operator is converted to a “single-valued” Lipschitz continuous operator. The first illustrative example is the firm shrinkage operator which can be derived as a continuous relaxation of the hard shrinkage operator. We also derive a new operator as a continuous relaxation of the discontinuous shrinkage operator associated with the so-called reverse ordered weighted <inline-formula><tex-math>$ell _{1}$</tex-math></inline-formula> (ROWL) penalty. Numerical examples demonstrate potential advantages of the continuous relaxation.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"753-767"},"PeriodicalIF":2.9,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11034740","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}