{"title":"Wasserstein Non-Negative Matrix Factorization for Multi-Layered Graphs and its Application to Mobility Data","authors":"Hirotaka Kaji;Kazushi Ikeda","doi":"10.1109/OJSP.2025.3528869","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528869","url":null,"abstract":"Multi-layered graphs are popular in mobility studies because transportation data include multiple modalities, such as railways, buses, and taxis. Another example of a multi-layered graph is the time series of mobility when periodicity is considered. The graphs are analyzed using standard signal processing methods such as singular value decomposition and tensor analysis, which can estimate missing values. However, their feature extraction abilities are insufficient for optimizing mobility networks. This study proposes a method that combines the Wasserstein non-negative matrix factorization (W-NMF) with line graphs to obtain low-dimensional representations of multi-layered graphs. A line graph is defined as the dual graph of a graph, where the vertices correspond to the edges of the original graph, and the edges correspond to the vertices. Thus, the shortest path length between two vertices in the line graph corresponds to the distance between the edges in the original graph. Through experiments using synthetic and benchmark datasets, we show that the performance and robustness of our method are superior to conventional methods. Additionally, we apply our method to real-world taxi origin—destination data as a mobility dataset and discuss the findings.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"194-202"},"PeriodicalIF":2.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose N. Filipe;Luis M. N. Tavora;Sergio M. M. Faria;Antonio Navarro;Pedro A. A. Assuncao
{"title":"Linear Multivariate Decision Trees for Fast QTMT Partitioning in VVC","authors":"Jose N. Filipe;Luis M. N. Tavora;Sergio M. M. Faria;Antonio Navarro;Pedro A. A. Assuncao","doi":"10.1109/OJSP.2025.3528897","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528897","url":null,"abstract":"The demand for ultra-high definition (UHD) content has led to the development of advanced compression tools to enhance the efficiency of standard codecs. One such tool is the Quaternary Tree and Multi-Type Tree (QTMT) used in the Versatile Video Coding (VVC), which significantly improves coding efficiency over previous standards, but introduces substantially higher computational complexity. To address the challenge of reducing computational complexity with minimal impact on coding efficiency, this paper presents a novel approach for intra-coding 360<inline-formula><tex-math>$^{circ }$</tex-math></inline-formula> video in Equirectangular Projection (ERP) format. By exploiting distinct complexity and spatial characteristics of the North, Equator, and South regions in ERP images, the proposed method is devised upon a region-based approach, using novel linear multivariate decision trees to determine whether a given partition type can be skipped. Optimisation of model parameters and an adaptive thresholding method is also presented. The experimental results show a Complexity Gain of approximately 16% with a negligible BD-Rate loss of only 0.06%, surpassing current state-of-the-art methods in terms of complexity gain per percentage point of BD-Rate loss.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"175-183"},"PeriodicalIF":2.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Target Tracking Using a Time-Varying Autoregressive Dynamic Model","authors":"Ralph J. Mcdougall;Simon J. Godsill","doi":"10.1109/OJSP.2025.3528896","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528896","url":null,"abstract":"Target tracking algorithms commonly use structured dynamic models which require prior training of fixed model parameters. These trackers have reduced accuracy if the target behaviour does not match the dynamic model. This work develops an algorithm that can infer target dynamic behaviour online, allowing the target dynamic to be time-varying as well. A time-varying target dynamic allows the target to change its level of maneuverability continuously through the trajectory, so the trajectory may have highly variable levels of agility. The developed tracker assumes the target dynamic can be described by an autoregressive model with time-varying parameters and constant, but unknown innovation variance. The autoregressive coefficients and innovation variance are then inferred online while simultaneously tracking the target. A data-association model is included to allow for clutter in the target measurements. This tracker is then compared against common structured trackers and is shown that it can approximate these models, while also showing better state filtering and prediction accuracy for an agile target.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"147-155"},"PeriodicalIF":2.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Generative Class Incremental Learning Performance With a Model Forgetting Approach","authors":"Taro Togo;Ren Togo;Keisuke Maeda;Takahiro Ogawa;Miki Haseyama","doi":"10.1109/OJSP.2025.3528900","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528900","url":null,"abstract":"This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and it is considered one of the important tasks in society as one of the continual learning approaches for generative models. The ability to forget is a crucial brain function that facilitates continual learning by selectively discarding less relevant information for humans. However, in the field of machine learning models, the concept of intentionally forgetting has not been extensively investigated. In this study, we aim to bridge this gap by incorporating the forgetting mechanisms into GCIL, thereby examining their impact on the models' ability to learn in continual learning. Through our experiments, we have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge, underscoring the positive role that strategic forgetting plays in the process of continual learning.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"203-212"},"PeriodicalIF":2.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840249","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143430568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Accelerated Successive Convex Approximation Scheme With Exact Step Sizes for L1-Regression","authors":"Lukas Schynol;Moritz Hemsing;Marius Pesavento","doi":"10.1109/OJSP.2025.3528875","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528875","url":null,"abstract":"We consider the minimization of <inline-formula><tex-math>$ell _{1}$</tex-math></inline-formula>-regularized least-squares problems. A recent optimization approach uses successive convex approximations with an exact line search, which is highly competitive, especially in sparse problem instances. This work proposes an acceleration scheme for the successive convex approximation technique with a negligible additional computational cost. We demonstrate this scheme by devising three related accelerated algorithms with provable convergence. The first introduces an additional descent step along the past optimization trajectory in the variable update that is inspired by Nesterov's accelerated gradient method and uses a closed-form step size. The second performs a simultaneous descent step along both the best response and the past trajectory, thereby finding a two-dimensional step size, also in closed-form. The third algorithm combines the previous two approaches. All algorithms are hyperparameter-free. Empirical results confirm that the acceleration approaches improve the convergence rate compared to benchmark algorithms, and that they retain the benefits of successive convex approximation also in non-sparse instances.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"184-193"},"PeriodicalIF":2.9,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10840211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SpoofCeleb: Speech Deepfake Detection and SASV in the Wild","authors":"Jee-weon Jung;Yihan Wu;Xin Wang;Ji-Hoon Kim;Soumi Maiti;Yuta Matsunaga;Hye-jin Shim;Jinchuan Tian;Nicholas Evans;Joon Son Chung;Wangyou Zhang;Seyun Um;Shinnosuke Takamichi;Shinji Watanabe","doi":"10.1109/OJSP.2025.3529377","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3529377","url":null,"abstract":"This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a suitable form for TTS training. We subsequently train 23 contemporary TTS systems. SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions. The dataset includes carefully partitioned training, validation, and evaluation sets with well-controlled experimental protocols. We present the baseline results for both SDD and SASV tasks. All data, protocols, and baselines are publicly available at <uri>https://jungjee.github.io/spoofceleb</uri>.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"68-77"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explicit Modeling of Audio Circuits With Multiple Nonlinearities for Virtual Analog Applications","authors":"Riccardo Giampiccolo;Sebastian Cristian Gafencu;Alberto Bernardini","doi":"10.1109/OJSP.2025.3528334","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528334","url":null,"abstract":"Virtual Analog (VA) modeling aims at recreating the sound of analog audio gear in the digital domain. As far as white-box techniques are concerned, the class of circuits that can be emulated in real-time is still limited since circuits with multiple nonlinear elements are characterized by a very high computational cost. Wave Digital (WD) Filters have been showing to be instrumental for accomplishing white-box VA modeling. Over the past few years, new techniques have been developed for widening said class, mostly requiring iterative methods for the solution of the implicit relations among WD blocks. In this article, we present a novel method that allows us to emulate circuits with multiple (even multi-port) nonlinearities in an explicit and efficient fashion by combining vector waves and neural networks. The method is general and can be applied to explicitly solve arbitrary nonlinear circuits provided that one sufficient condition on the port resistances is met. In order to show the potentiality of our approach, we emulate the VOX V847 Wah-Wah and the Arbiter Electronics Fuzz Face pedals obtaining a notable Real-Time Ratio and paving the way toward the real-time emulation of circuits with a high number of nonlinear elements.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"156-164"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839128","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Available Degrees of Spatial Multiplexing of a Uniform Linear Array With Multiple Polarizations: A Holographic Perspective","authors":"Xavier Mestre;Adrian Agustin;David Sardà","doi":"10.1109/OJSP.2025.3529326","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3529326","url":null,"abstract":"The capabilities of multi-antenna technology have recently been significantly enhanced by the proliferation of extra large array architectures. The high dimensionality of these systems implies that communications take place in the near-field regime, which poses some questions as to their effective performance even under simple line of sight configurations. In order to study these limitations, a uniform linear array (ULA) is considered here, the elements of which are three infinitesimal dipoles transmitting different signals in the three spatial dimensions. The receiver consists of a single element with three orthogonal infinitesimal dipoles and full channel state information is assumed to be available at both ends. A capacity analysis is presented when the number of elements of the ULA increases without bound while the interelement distance converges to zero, so that the total aperture length is kept asymptotically fixed. In particular, the total number of available spatial degrees of freedom is shown to depend crucially on the receiver position in space, and closed form expressions are provided for the different achievability regions. From the analysis it can be concluded that the use of three orthogonal polarizations at the transmitter guarantees the universal availability of at least two spatial streams everywhere.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"108-117"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging","authors":"Charilaos Papaioannou;Emmanouil Benetos;Alexandros Potamianos","doi":"10.1109/OJSP.2025.3529315","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3529315","url":null,"abstract":"We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"138-146"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaime Garcia-Martinez;David Diaz-Guerra;Archontis Politis;Tuomas Virtanen;Julio J. Carabias-Orti;Pedro Vera-Candeas
{"title":"SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation","authors":"Jaime Garcia-Martinez;David Diaz-Guerra;Archontis Politis;Tuomas Virtanen;Julio J. Carabias-Orti;Pedro Vera-Candeas","doi":"10.1109/OJSP.2025.3528361","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528361","url":null,"abstract":"Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic, musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions by employing high-quality digital libraries that define virtual instrument sounds for MIDI playback (a.k.a., soundfonts). Moreover, we demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet, and evaluate its performance under both synthetic and real-world conditions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"129-137"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}