{"title":"Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments","authors":"Saeed Ghoorchian;Evgenii Kortukov;Setareh Maghsudi","doi":"10.1109/OJSP.2024.3389809","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3389809","url":null,"abstract":"Maximizing long-term rewards is the primary goal in sequential decision-making problems. The majority of existing methods assume that side information is freely available, enabling the learning agent to observe all features' states before making a decision. In real-world problems, however, collecting beneficial information is often costly. That implies that, besides individual arms' reward, learning the observations of the features' states is essential to improve the decision-making strategy. The problem is aggravated in a non-stationary environment where reward and cost distributions undergo abrupt changes over time. To address the aforementioned dual learning problem, we extend the contextual bandit setting and allow the agent to observe subsets of features' states. The objective is to maximize the long-term average gain, which is the difference between the accumulated rewards and the paid costs on average. Therefore, the agent faces a trade-off between minimizing the cost of information acquisition and possibly improving the decision-making process using the obtained information. To this end, we develop an algorithm that guarantees a sublinear regret in time. Numerical results demonstrate the superiority of our proposed policy in a real-world scenario.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"820-830"},"PeriodicalIF":2.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10502231","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS","authors":"Myeongjin Ko;Euiyeon Kim;Yong-Hoon Choi","doi":"10.1109/OJSP.2024.3386495","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3386495","url":null,"abstract":"The diffusion model is capable of generating high-quality data through a probabilistic approach. However, it suffers from the drawback of slow generation speed due to its requirement for many time steps. To address this limitation, recent models such as denoising diffusion implicit models (DDIM) focus on sample generation without explicitly modeling the entire probability distribution, while models like denoising diffusion generative adversarial networks (GAN) combine diffusion processes with GANs. In the field of speech synthesis, a recent diffusion speech synthesis model called DiffGAN-TTS, which utilizes the structure of GANs, has been introduced and demonstrates superior performance in both speech quality and generation speed. In this paper, to further enhance the performance of DiffGAN-TTS, we propose a speech synthesis model with two discriminators: a diffusion discriminator to learn the distribution of the reverse process, and a spectrogram discriminator to learn the distribution of the generated data. Objective metrics such as the structural similarity index measure (SSIM), mel-cepstral distortion (MCD), F0 root mean squared error (F0- RMSE), phoneme error rate (PER), word error rate (WER), as well as subjective metrics like mean opinion score (MOS), are used to evaluate the performance of the proposed model. The evaluation results demonstrate that our model matches or exceeds recent state-of-the-art models like FastSpeech 2 and DiffGAN-TTS across various metrics. Our code and audio samples are available on GitHub.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"577-587"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494889","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140647820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Stationary Linear Bandits With Dimensionality Reduction for Large-Scale Recommender Systems","authors":"Saeed Ghoorchian;Evgenii Kortukov;Setareh Maghsudi","doi":"10.1109/OJSP.2024.3386490","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3386490","url":null,"abstract":"Taking advantage ofcontextual information can potentially boost the performance of recommender systems. In the era of Big Data, such side information often has several dimensions. Thus, developing decision-making algorithms to cope with such a high-dimensional context in real time is essential. That is specifically challenging when the decision-maker has a variety of items to recommend. In addition, changes in items' popularity or users' preferences can hinder the performance of the deployed recommender system due to a lack of robustness to distribution shifts in the environment. In this paper, we build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors, a large set of arms, and non-stationary reward-generating processes. Our Thompson sampling-based policy reduces the dimension of feature vectors using random projection and uses exponentially increasing weights to decrease the influence of past observations with time. Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime. We prove a regret bound that scales as a factor of the reduced dimension instead of the original one. To evaluate our proposed recommender system numerically, we apply it to three real-world datasets. The theoretical and numerical results demonstrate the effectiveness of our proposed algorithm in making a trade-off between computational complexity and regret performance compared to the state-of-the-art.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"548-558"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494875","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140633561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Body Motion Segmentation via Multilayer Graph Processing for Wearable Sensor Signals","authors":"Qinwen Deng;Songyang Zhang;Zhi Ding","doi":"10.1109/OJSP.2024.3407662","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3407662","url":null,"abstract":"Human body motion segmentation plays a major role in many applications, ranging from computer vision to robotics. Among a variety of algorithms, graph-based approaches have demonstrated exciting potential in motion analysis owing to their power to capture the underlying correlations among joints. However, most existing works focus on simpler single-layer geometric structures, whereas multi-layer spatial-temporal graph structure can provide more informative results. To provide an interpretable analysis on multilayer spatial-temporal structures, we revisit the emerging field of multilayer graph signal processing (M-GSP), and propose novel approaches based on M-GSP to human motion segmentation. Specifically, we model the spatial-temporal relationships via multilayer graphs (MLG) and introduce M-GSP spectrum analysis for feature extraction. We present two different M-GSP based algorithms for unsupervised segmentation in the MLG spectrum and vertex domains, respectively. Our experimental results demonstrate the robustness and effectiveness of our proposed methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"934-947"},"PeriodicalIF":2.9,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptable L4S Congestion Control for Cloud-Based Real-Time Streaming Over 5G","authors":"Jangwoo Son;Yago Sanchez;Cornelius Hellge;Thomas Schierl","doi":"10.1109/OJSP.2024.3405719","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3405719","url":null,"abstract":"Achieving reliable low-latency streaming on real-time immersive services that require seamless interaction has been of increasing importance recently. To cope with such an immersive service requirement, IETF and 3GPP defined Low Latency, Low Loss, and Scalable Throughput (L4S) architecture and terminologies to enable delay-critical applications to achieve low congestion and scalable bitrate control over 5G. With low-latency applications in mind, this paper presents a cloud-based streaming system using WebRTC for real-time communication with an adaptable L4S congestion control (aL4S-CC). aL4S-CC is designed to prevent the target service from surpassing a required end-to-end latency. It is evaluated against existing congestion controls GCC and ScreamV2 across two configurations: 1) standard L4S (sL4S) which has no knowledge of Explicit Congestion Notification (ECN) marking scheme information; 2) conscious L4S (cL4S) which recognizes the ECN marking scheme information. The results show that aL4S-CC achieves high link utilization with low latency while maintaining good performance in terms of fairness, and cL4S improves sL4S's performance by having an efficient trade-off between link utilization and latency. In the entire simulation, the gain of link utilization on cL4S is 1.4%, 4%, and 17.9% on average compared to sL4S, GCC, and ScreamV2, respectively, and the ratio of duration exceeding the target queuing delay achieves the lowest values of 1% and 0.9% for cL4S and sL4S, respectively.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"841-849"},"PeriodicalIF":2.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contactless Skin Blood Perfusion Imaging via Multispectral Information, Spectral Unmixing and Multivariable Regression","authors":"Liliana Granados-Castro;Omar Gutierrez-Navarro;Aldo Rodrigo Mejia-Rodriguez;Daniel Ulises Campos-Delgado","doi":"10.1109/OJSP.2024.3381892","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3381892","url":null,"abstract":"Noninvasive methods for assessing in-vivo skin blood perfusion parameters, such as hemoglobin oxygenation, are crucial for diagnosing and monitoring microvascular diseases. This approach is particularly beneficial for patients with compromised skin, where standard contact-based clinical devices are inappropriate. For this goal, we propose the analysis of multimodal data from an occlusion protocol applied to 18 healthy participants, which includes multispectral imaging of the whole hand and reference photoplethysmography information from the thumb. Multispectral data analysis was conducted using two different blind linear unmixing methods: principal component analysis (PCA), and extended blind endmember and abundance extraction (EBEAE). Perfusion maps for oxygenated and deoxygenated hemoglobin changes in the hand were generated using linear multivariable regression models based on the unmixing methods. Our results showed high accuracy, with \u0000<inline-formula><tex-math>$text {R}^{2}$</tex-math></inline-formula>\u0000-adjusted values, up to 0.90 \u0000<inline-formula><tex-math>$pm$</tex-math></inline-formula>\u0000 0.08. Further analysis revealed that using more than four characteristic components during spectral unmixing did not improve the fit of the model. Bhattacharyya distance results showed that the fitted models with EBEAE were more sensitive to hemoglobin changes during occlusion stages, up to four times higher than PCA. Our study concludes that multispectral imaging with EBEAE is effective in quantifying changes in oxygenated hemoglobin levels, especially when using 3 to 4 characteristic components. Our proposed method holds promise for the noninvasive diagnosis and monitoring of superficial microvascular alterations across extensive anatomical regions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"101-111"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10480236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140639430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech","authors":"Abhayjeet Singh;Amala Nagireddi;Anjali Jayakumar;Deekshitha G;Jesuraja Bandekar;Roopa R;Sandhya Badiger;Sathvik Udupa;Saurabh Kumar;Prasanta Kumar Ghosh;Hema A Murthy;Heiga Zen;Pranaw Kumar;Kamal Kant;Amol Bole;Bira Chandra Singh;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich","doi":"10.1109/OJSP.2024.3379092","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379092","url":null,"abstract":"The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-lingual Text to Speech (TTS) model using datasets in Marathi, Hindi, and Telugu, with at least 40 hours of data released for each of the male and female voice artists in each language. The challenge encourages the advancement of TTS in Indian Languages as well as the development of techniques involved in TTS data selection and model compression. The 3 tracks of LIMMITS'23 have provided an opportunity for various researchers and practitioners around the world to explore the state-of-the-art techniques in TTS research.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"790-798"},"PeriodicalIF":2.9,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10479171","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julius Richter;Simon Welker;Jean-Marie Lemercier;Bunlong Lay;Tal Peer;Timo Gerkmann
{"title":"Causal Diffusion Models for Generalized Speech Enhancement","authors":"Julius Richter;Simon Welker;Jean-Marie Lemercier;Bunlong Lay;Tal Peer;Timo Gerkmann","doi":"10.1109/OJSP.2024.3379070","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379070","url":null,"abstract":"In this work, we present a causal speech enhancement system that is designed to handle different types of corruptions. This paper is an extended version of our contribution to the “ICASSP 2023 Speech Signal Improvement Challenge”. The method is based on a generative diffusion model which has been shown to work well in scenarios beyond speech-in-noise, such as missing data and non-additive corruptions. We guarantee causal processing with an algorithmic latency of 20 ms by modifying the network architecture and removing non-causal normalization techniques. To train and test our model, we generate a new corrupted speech dataset which includes additive background noise, reverberation, clipping, packet loss, bandwidth reduction, and codec artifacts. We compare the causal and non-causal versions of our method to investigate the impact of causal processing and we assess the gap between specialized models trained on a particular corruption type and the generalized model trained on all corruptions. Although specialized models and non-causal models have a small advantage, we show that the generalized causal approach does not suffer from a significant performance penalty, while it can be flexibly employed for real-world applications where different types of distortions may occur.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"780-789"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475490","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Drone-vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results","authors":"Angelo Coluccia;Alessio Fascista;Lars Sommer;Arne Schumann;Anastasios Dimou;Dimitrios Zarpalas","doi":"10.1109/OJSP.2024.3379073","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379073","url":null,"abstract":"This paper presents the 6th edition of the “Drone-vs-Bird” detection challenge, jointly organized with the WOSDETC workshop within the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023. The main objective of the challenge is to advance the current state-of-the-art in detecting the presence of one or more Unmanned Aerial Vehicles (UAVs) in real video scenes, while facing challenging conditions such as moving cameras, disturbing environmental factors, and the presence of birds flying in the foreground. For this purpose, a video dataset was provided for training the proposed solutions, and a separate test dataset was released a few days before the challenge deadline to assess their performance. The dataset has continually expanded over consecutive installments of the Drone-vs-Bird challenge and remains openly available to the research community, for non-commercial purposes. The challenge attracted novel signal processing solutions, mainly based on deep learning algorithms. The paper illustrates the results achieved by the teams that successfully participated in the 2023 challenge, offering a concise overview of the state-of-the-art in the field of drone detection using video signal processing. Additionally, the paper provides valuable insights into potential directions for future research, building upon the main pros and limitations of the solutions presented by the participating teams.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"766-779"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach
{"title":"Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks","authors":"Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach","doi":"10.1109/OJSP.2024.3378593","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378593","url":null,"abstract":"The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"700-716"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}