{"title":"Available Degrees of Spatial Multiplexing of a Uniform Linear Array With Multiple Polarizations: A Holographic Perspective","authors":"Xavier Mestre;Adrian Agustin;David Sardà","doi":"10.1109/OJSP.2025.3529326","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3529326","url":null,"abstract":"The capabilities of multi-antenna technology have recently been significantly enhanced by the proliferation of extra large array architectures. The high dimensionality of these systems implies that communications take place in the near-field regime, which poses some questions as to their effective performance even under simple line of sight configurations. In order to study these limitations, a uniform linear array (ULA) is considered here, the elements of which are three infinitesimal dipoles transmitting different signals in the three spatial dimensions. The receiver consists of a single element with three orthogonal infinitesimal dipoles and full channel state information is assumed to be available at both ends. A capacity analysis is presented when the number of elements of the ULA increases without bound while the interelement distance converges to zero, so that the total aperture length is kept asymptotically fixed. In particular, the total number of available spatial degrees of freedom is shown to depend crucially on the receiver position in space, and closed form expressions are provided for the different achievability regions. From the analysis it can be concluded that the use of three orthogonal polarizations at the transmitter guarantees the universal availability of at least two spatial streams everywhere.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"108-117"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging","authors":"Charilaos Papaioannou;Emmanouil Benetos;Alexandros Potamianos","doi":"10.1109/OJSP.2025.3529315","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3529315","url":null,"abstract":"We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification, where a model must generalize to new classes based on only a few available examples. Extending Prototypical Networks, LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items, rather than one prototype per label. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music, and is evaluated against existing approaches in the literature. The results demonstrate a significant performance improvement in almost all domains and training setups when using LC-Protonets for multi-label classification. In addition to training a few-shot learning model from scratch, we explore the use of a pre-trained model, obtained via supervised learning, to embed items in the feature space. Fine-tuning improves the generalization ability of all methods, yet LC-Protonets achieve high-level performance even without fine-tuning, in contrast to the comparative approaches. We finally analyze the scalability of the proposed method, providing detailed quantitative metrics from our experiments. The implementation and experimental setup are made publicly available, offering a benchmark for future research.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"138-146"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839319","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaime Garcia-Martinez;David Diaz-Guerra;Archontis Politis;Tuomas Virtanen;Julio J. Carabias-Orti;Pedro Vera-Candeas
{"title":"SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation","authors":"Jaime Garcia-Martinez;David Diaz-Guerra;Archontis Politis;Tuomas Virtanen;Julio J. Carabias-Orti;Pedro Vera-Candeas","doi":"10.1109/OJSP.2025.3528361","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528361","url":null,"abstract":"Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic, musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions by employing high-quality digital libraries that define virtual instrument sounds for MIDI playback (a.k.a., soundfonts). Moreover, we demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet, and evaluate its performance under both synthetic and real-world conditions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"129-137"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143422790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing the Multi-Phasal Nature of Speech-EEG for Enhancing Imagined Speech Recognition","authors":"Rini Sharon;Mriganka Sur;Hema Murthy","doi":"10.1109/OJSP.2025.3528368","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528368","url":null,"abstract":"Analyzing speech-electroencephalogram (EEG) is pivotal for developing non-invasive and naturalistic brain-computer interfaces. Recognizing that the nature of human communication involves multiple phases like audition, imagination, articulation, and production, this study uncovers the shared cognitive imprints that represent speech cognition across these phases. Regression analysis, using correlation metrics reveal pronounced inter-phasal congruence. This insight promotes a shift from single-phase-centric recognition models to harnessing integrated phase data, thereby enhancing recognition of cognitive speech. Having established the presence of inter-phase associations, a common representation learning feature extractor is introduced, adept at capturing the correlations and replicability across phases. The features so extracted are observed to provide superior discrimination of cognitive speech units. Notably, the proposed approach proves resilient even in the absence of comprehensive multi-phasal data. Through thorough control checks and illustrative topographical visualizations, our observations are substantiated. The findings indicate that the proposed multi-phase approach significantly enhances EEG-based speech recognition, achieving an accuracy gain of 18.2% for 25 cognitive units in continuous speech EEG over models reliant solely on single-phase data.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"78-88"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated Gaussian Processes for Tracking","authors":"Fred Lydeard;Bashar I. Ahmad;Simon Godsill","doi":"10.1109/OJSP.2025.3529308","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3529308","url":null,"abstract":"In applications such as tracking and localisation, a dynamical model is typically specified for the modelling of an object's motion. An appealing alternative to the traditional parametric Markovian dynamical models is the Gaussian Process (GP). GPs can offer additional flexibility and represent non-Markovian, long-term, dependencies in the target's kinematics. However, a standard GP with constant or zero mean is prone to oscillating around its mean and not sufficiently exploring the state space. In this paper, we consider extensions of the common GP framework such that a GP acts as the driving <italic>disturbance</i> term that is integrated over time to produce a new Integrated GP (iGP) dynamical model. It potentially provides a more realistic modelling of agile objects' behaviour. We prove here that the introduced iGP model is, itself, a GP with a non-stationary kernel, which we derive fully in the case of the squared exponential GP kernel. Thus, the iGP is straightforward to implement, with the usual growth over time of the computational burden. We further show how to implement the model with fixed time complexity in a standard sequential Bayesian updating framework using Kalman filter-based computations, employing a sliding window Markovian approximation. Example results from real radar measurements and synthetic data are presented to demonstrate the ability of the proposed iGP modelling to facilitate more accurate tracking compared to conventional GP.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"99-107"},"PeriodicalIF":2.9,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10839315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Machine Learning Techniques for Head-Related Transfer Function Individualization","authors":"Davide Fantini;Michele Geronazzo;Federico Avanzini;Stavros Ntalampiras","doi":"10.1109/OJSP.2025.3528330","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3528330","url":null,"abstract":"Machine learning (ML) has become pervasive in various research fields, including binaural synthesis personalization, which is crucial for sound in immersive virtual environments. Researchers have mainly addressed this topic by estimating the individual head-related transfer function (HRTF). HRTFs are utilized to render audio signals at specific spatial positions, thereby simulating real-world sound wave interactions with the human body. As such, an HRTF that is compliant with individual characteristics enhances the realism of the binaural simulation. This survey systematically examines the HRTF individualization works based on ML proposed in the literature. The analyzed works are organized according to the processing steps involved in the ML workflow, including the employed dataset, input and output types, data preprocessing operations, ML models, and model evaluation. In addition to categorizing the works of the existing literature, this survey discusses their achievements, identifies their limitations, and outlines aspects that require further investigation at the crossroads of research communities in acoustics, audio signal processing, and machine learning.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"30-56"},"PeriodicalIF":2.9,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10836943","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143107176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The ICASSP 2024 Audio Deep Packet Loss Concealment Grand Challenge","authors":"Lorenz Diener;Solomiya Branets;Ando Saabas;Ross Cutler","doi":"10.1109/OJSP.2025.3526552","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3526552","url":null,"abstract":"Audio packet loss concealment hides gaps in VoIP audio streams caused by network packet loss. It operates in real-time with low computational requirements and latency, as demanded by modern communication systems. With the ICASSP 2024 Audio Deep Packet Loss Concealment Grand Challenge, we build on the success of the previous Audio PLC Challenge held at INTERSPEECH 2022. For the 2024 challenge at ICASSP, we update the challenge by introducing an overall harder blind evaluation set and extending the task from wideband to fullband audio, in keeping with current trends in internet telephony. In addition to the Word Accuracy metric, we also use a questionnaire based on an extension of ITU-T P.804 to more closely evaluate the performance of systems specifically on the PLC task. We evaluate a total of 9 systems submitted by different academic and industry teams, 8 of which satisfy the strict real-time performance requirements of the challenge, using both P.804 and Word Accuracy evaluations. Two systems share first place, with one of the systems having the advantage in terms of naturalness, while the other wins in terms of intelligibility. These systems are the current state of the art for Deep PLC.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"231-237"},"PeriodicalIF":2.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10830479","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICASSP 2024 Speech Signal Improvement Challenge","authors":"Nicolae-Cătălin Ristea;Babak Naderi;Ando Saabas;Ross Cutler;Sebastian Braun;Solomiya Branets","doi":"10.1109/OJSP.2025.3526550","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3526550","url":null,"abstract":"The ICASSP 2024 Speech Signal Improvement Challenge aims to advance research in enhancing speech signal quality within communication systems. The speech signal quality can be assessed using the SIG metric from ITU-T P.835 and still remains a top issue in audio communication and conferencing systems. For example, in the ICASSP 2023 Deep Noise Suppression Challenge, the improvement in the background and overall quality is impressive, while the speech signal enhancement was not statistically significant. To improve the speech signal the following speech impairment areas must be addressed: coloration, discontinuity, loudness, reverberation, and noise. To this end, we organized ICASSP 2024 Speech Signal Improvement Challenge, which marks the second signal-focused challenge, built upon the success of the previous ICASSP 2023 Speech Signal Improvement Challenge. A training and test set was provided for the challenge, and the winners were determined using an extended crowdsourced implementation of ITU-T P.804’s listening phase and the word accuracy (WAcc) rate. The results show that significant improvement was made across all measured dimensions of speech quality.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"238-246"},"PeriodicalIF":2.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10830509","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Robust Modulation Recognition Guided by Attention Mechanisms","authors":"Quanhai Zhan;Xiongwei Zhang;Meng Sun;Lei Song;Zhenji Zhou","doi":"10.1109/OJSP.2025.3526577","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3526577","url":null,"abstract":"Deep neural networks have demonstrated considerable effectiveness in recognizing complex communications signals through their applications in the tasks of automatic modulation recognition. However, the resilience of these networks is undermined by the introduction of carefully designed adversarial examples that compromise the reliability of the decision processes. In order to address this issue, an Attention-Guided Automatic Modulation Recognition (AG-AMR) method is proposed in this paper. The method introduces an optimized attention mechanism within the Transformer framework, where signal features are extracted and filtered based on the weights of the attention module during the training process, which makes the model to focus on key features for the task. Furthermore, by removing features of low importance where adversarial perturbations may appear, the proposed method mitigates the negative impacts of adversarial perturbations on modulation classification, thereby it improves both accuracy and robustness. Experimental results on benchmark datasets show that AG-AMR obtains a high level of accuracy on modulation recognition and exhibits significant robustness. Furthermore, when working together with adversarial training, it is shown that AG-AMR effectively resists several existing adversarial attacks, which thus further validates its effectiveness on defending against adversarial sample attacks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"17-29"},"PeriodicalIF":2.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10829960","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143107175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modular Hypernetworks for Scalable and Adaptive Deep MIMO Receivers","authors":"Tomer Raviv;Nir Shlezinger","doi":"10.1109/OJSP.2025.3526548","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3526548","url":null,"abstract":"Deep neural networks (DNNs) were shown to facilitate the operation of uplink multiple-input multiple-output (MIMO) receivers, with emerging architectures augmenting modules of classic receiver processing. Current designs employ static DNNs, whose architecture is fixed and weights are pre-trained. This poses a notable challenge, as the resulting MIMO receiver is suitable for a given configuration, i.e., channel distribution and number of users, while in practice these parameters change frequently with network variations and users leaving and joining the network. In this work, we tackle this core challenge of DNN-aided MIMO receivers. We build upon the concept of <italic>hypernetworks</i>, augmenting the receiver with a pre-trained deep model whose purpose is to update the weights of the DNN-aided receiver in response to instantaneous channel variations. We design our hypernetwork to augment <italic>modular</i> deep receivers, leveraging their modularity to have the hypernetwork adapt not only the weights, but also the architecture. Our modular hypernetwork leads to a DNN-aided receiver whose architecture and resulting complexity adapt to the number of users, as well as to channel variations, without re-training. Our numerical studies demonstrate superior error-rate performance of modular hypernetworks in time-varying channels compared to static pre-trained receivers, while providing rapid adaptivity and scalability to network variations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"256-265"},"PeriodicalIF":2.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10830517","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}