{"title":"Body Motion Segmentation via Multilayer Graph Processing for Wearable Sensor Signals","authors":"Qinwen Deng;Songyang Zhang;Zhi Ding","doi":"10.1109/OJSP.2024.3407662","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3407662","url":null,"abstract":"Human body motion segmentation plays a major role in many applications, ranging from computer vision to robotics. Among a variety of algorithms, graph-based approaches have demonstrated exciting potential in motion analysis owing to their power to capture the underlying correlations among joints. However, most existing works focus on simpler single-layer geometric structures, whereas multi-layer spatial-temporal graph structure can provide more informative results. To provide an interpretable analysis on multilayer spatial-temporal structures, we revisit the emerging field of multilayer graph signal processing (M-GSP), and propose novel approaches based on M-GSP to human motion segmentation. Specifically, we model the spatial-temporal relationships via multilayer graphs (MLG) and introduce M-GSP spectrum analysis for feature extraction. We present two different M-GSP based algorithms for unsupervised segmentation in the MLG spectrum and vertex domains, respectively. Our experimental results demonstrate the robustness and effectiveness of our proposed methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"934-947"},"PeriodicalIF":2.9,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptable L4S Congestion Control for Cloud-Based Real-Time Streaming Over 5G","authors":"Jangwoo Son;Yago Sanchez;Cornelius Hellge;Thomas Schierl","doi":"10.1109/OJSP.2024.3405719","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3405719","url":null,"abstract":"Achieving reliable low-latency streaming on real-time immersive services that require seamless interaction has been of increasing importance recently. To cope with such an immersive service requirement, IETF and 3GPP defined Low Latency, Low Loss, and Scalable Throughput (L4S) architecture and terminologies to enable delay-critical applications to achieve low congestion and scalable bitrate control over 5G. With low-latency applications in mind, this paper presents a cloud-based streaming system using WebRTC for real-time communication with an adaptable L4S congestion control (aL4S-CC). aL4S-CC is designed to prevent the target service from surpassing a required end-to-end latency. It is evaluated against existing congestion controls GCC and ScreamV2 across two configurations: 1) standard L4S (sL4S) which has no knowledge of Explicit Congestion Notification (ECN) marking scheme information; 2) conscious L4S (cL4S) which recognizes the ECN marking scheme information. The results show that aL4S-CC achieves high link utilization with low latency while maintaining good performance in terms of fairness, and cL4S improves sL4S's performance by having an efficient trade-off between link utilization and latency. In the entire simulation, the gain of link utilization on cL4S is 1.4%, 4%, and 17.9% on average compared to sL4S, GCC, and ScreamV2, respectively, and the ratio of duration exceeding the target queuing delay achieves the lowest values of 1% and 0.9% for cL4S and sL4S, respectively.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"841-849"},"PeriodicalIF":2.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contactless Skin Blood Perfusion Imaging via Multispectral Information, Spectral Unmixing and Multivariable Regression","authors":"Liliana Granados-Castro;Omar Gutierrez-Navarro;Aldo Rodrigo Mejia-Rodriguez;Daniel Ulises Campos-Delgado","doi":"10.1109/OJSP.2024.3381892","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3381892","url":null,"abstract":"Noninvasive methods for assessing in-vivo skin blood perfusion parameters, such as hemoglobin oxygenation, are crucial for diagnosing and monitoring microvascular diseases. This approach is particularly beneficial for patients with compromised skin, where standard contact-based clinical devices are inappropriate. For this goal, we propose the analysis of multimodal data from an occlusion protocol applied to 18 healthy participants, which includes multispectral imaging of the whole hand and reference photoplethysmography information from the thumb. Multispectral data analysis was conducted using two different blind linear unmixing methods: principal component analysis (PCA), and extended blind endmember and abundance extraction (EBEAE). Perfusion maps for oxygenated and deoxygenated hemoglobin changes in the hand were generated using linear multivariable regression models based on the unmixing methods. Our results showed high accuracy, with \u0000<inline-formula><tex-math>$text {R}^{2}$</tex-math></inline-formula>\u0000-adjusted values, up to 0.90 \u0000<inline-formula><tex-math>$pm$</tex-math></inline-formula>\u0000 0.08. Further analysis revealed that using more than four characteristic components during spectral unmixing did not improve the fit of the model. Bhattacharyya distance results showed that the fitted models with EBEAE were more sensitive to hemoglobin changes during occlusion stages, up to four times higher than PCA. Our study concludes that multispectral imaging with EBEAE is effective in quantifying changes in oxygenated hemoglobin levels, especially when using 3 to 4 characteristic components. Our proposed method holds promise for the noninvasive diagnosis and monitoring of superficial microvascular alterations across extensive anatomical regions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"101-111"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10480236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140639430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech","authors":"Abhayjeet Singh;Amala Nagireddi;Anjali Jayakumar;Deekshitha G;Jesuraja Bandekar;Roopa R;Sandhya Badiger;Sathvik Udupa;Saurabh Kumar;Prasanta Kumar Ghosh;Hema A Murthy;Heiga Zen;Pranaw Kumar;Kamal Kant;Amol Bole;Bira Chandra Singh;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich","doi":"10.1109/OJSP.2024.3379092","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379092","url":null,"abstract":"The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-lingual Text to Speech (TTS) model using datasets in Marathi, Hindi, and Telugu, with at least 40 hours of data released for each of the male and female voice artists in each language. The challenge encourages the advancement of TTS in Indian Languages as well as the development of techniques involved in TTS data selection and model compression. The 3 tracks of LIMMITS'23 have provided an opportunity for various researchers and practitioners around the world to explore the state-of-the-art techniques in TTS research.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"790-798"},"PeriodicalIF":2.9,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10479171","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julius Richter;Simon Welker;Jean-Marie Lemercier;Bunlong Lay;Tal Peer;Timo Gerkmann
{"title":"Causal Diffusion Models for Generalized Speech Enhancement","authors":"Julius Richter;Simon Welker;Jean-Marie Lemercier;Bunlong Lay;Tal Peer;Timo Gerkmann","doi":"10.1109/OJSP.2024.3379070","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379070","url":null,"abstract":"In this work, we present a causal speech enhancement system that is designed to handle different types of corruptions. This paper is an extended version of our contribution to the “ICASSP 2023 Speech Signal Improvement Challenge”. The method is based on a generative diffusion model which has been shown to work well in scenarios beyond speech-in-noise, such as missing data and non-additive corruptions. We guarantee causal processing with an algorithmic latency of 20 ms by modifying the network architecture and removing non-causal normalization techniques. To train and test our model, we generate a new corrupted speech dataset which includes additive background noise, reverberation, clipping, packet loss, bandwidth reduction, and codec artifacts. We compare the causal and non-causal versions of our method to investigate the impact of causal processing and we assess the gap between specialized models trained on a particular corruption type and the generalized model trained on all corruptions. Although specialized models and non-causal models have a small advantage, we show that the generalized causal approach does not suffer from a significant performance penalty, while it can be flexibly employed for real-world applications where different types of distortions may occur.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"780-789"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475490","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Drone-vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results","authors":"Angelo Coluccia;Alessio Fascista;Lars Sommer;Arne Schumann;Anastasios Dimou;Dimitrios Zarpalas","doi":"10.1109/OJSP.2024.3379073","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379073","url":null,"abstract":"This paper presents the 6th edition of the “Drone-vs-Bird” detection challenge, jointly organized with the WOSDETC workshop within the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023. The main objective of the challenge is to advance the current state-of-the-art in detecting the presence of one or more Unmanned Aerial Vehicles (UAVs) in real video scenes, while facing challenging conditions such as moving cameras, disturbing environmental factors, and the presence of birds flying in the foreground. For this purpose, a video dataset was provided for training the proposed solutions, and a separate test dataset was released a few days before the challenge deadline to assess their performance. The dataset has continually expanded over consecutive installments of the Drone-vs-Bird challenge and remains openly available to the research community, for non-commercial purposes. The challenge attracted novel signal processing solutions, mainly based on deep learning algorithms. The paper illustrates the results achieved by the teams that successfully participated in the 2023 challenge, offering a concise overview of the state-of-the-art in the field of drone detection using video signal processing. Additionally, the paper provides valuable insights into potential directions for future research, building upon the main pros and limitations of the solutions presented by the participating teams.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"766-779"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach
{"title":"Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks","authors":"Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach","doi":"10.1109/OJSP.2024.3378593","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378593","url":null,"abstract":"The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"700-716"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sea-Wave: Speech Envelope Reconstruction From Auditory EEG With an Adapted WaveNet","authors":"Liuyin Yang;Bob Van Dyck;Marc M. Van Hulle","doi":"10.1109/OJSP.2024.3378594","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378594","url":null,"abstract":"Speech envelope reconstruction from EEG is shown to bear clinical potential to assess speech intelligibility. Linear models are commonly used to this end, but they have recently been outperformed in reconstruction scores by non-linear deep neural networks, particularly by dilated convolutional networks. This study presents Sea-Wave, a WaveNet-based architecture for speech envelope reconstruction that outperforms the state-of-the-art model. Our model is an extension of our submission for the Auditory EEG Challenge of the ICASSP Signal Processing Grand Challenge 2023. We improve upon our prior work by evaluating model components and hyperparameters through an ablation study and hyperparameter search, respectively. Our best subject-independent model achieves a Pearson correlation of 22.58% on seen and 11.58% on unseen subjects. After subject-specific fine-tuning, we find an average relative improvement of 30% for the seen subjects and a Pearson correlation of 56.57% for the best seen subject.Finally, we explore several model visualizations to obtain a better understanding of the model, the differences across subjects and the EEG features that relate to auditory perception.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"686-699"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474194","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Overview of the ADReSS-M Signal Processing Grand Challenge on Multilingual Alzheimer's Dementia Recognition Through Spontaneous Speech","authors":"Saturnino Luz;Fasih Haider;Davida Fromm;Ioulietta Lazarou;Ioannis Kompatsiaris;Brian MacWhinney","doi":"10.1109/OJSP.2024.3378595","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378595","url":null,"abstract":"The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"738-749"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICASSP 2023 Deep Noise Suppression Challenge","authors":"Harishchandra Dubey;Ashkan Aazami;Vishak Gopal;Babak Naderi;Sebastian Braun;Ross Cutler;Alex Ju;Mehdi Zohourian;Min Tang;Mehrsa Golestaneh;Robert Aichner","doi":"10.1109/OJSP.2024.3378602","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378602","url":null,"abstract":"The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster research in the field of DNS. Previous DNS challenges were held at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. This challenge aims to advance models capable of jointly addressing denoising, dereverberation, and interfering talker suppression, with separate tracks focusing on headset and speakerphone scenarios. The challenge facilitates personalized deep noise suppression by providing accompanying enrollment clips for each test clip, each containing the primary talker only, which can be used to compute a speaker identity feature and disentangle primary and interfering speech. While the majority of models submitted to the challenge were personalized, the same teams emerged as the winners in both tracks. The best models demonstrated improvements of 0.145 and 0.141 in the challenge's score, respectively, when compared to the noisy blind test set. We present additional analysis and draw comparisons to previous challenges.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"725-737"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}