IEEE open journal of signal processing最新文献

筛选
英文 中文
Body Motion Segmentation via Multilayer Graph Processing for Wearable Sensor Signals 通过多层图处理对可穿戴传感器信号进行身体运动分割
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-30 DOI: 10.1109/OJSP.2024.3407662
Qinwen Deng;Songyang Zhang;Zhi Ding
{"title":"Body Motion Segmentation via Multilayer Graph Processing for Wearable Sensor Signals","authors":"Qinwen Deng;Songyang Zhang;Zhi Ding","doi":"10.1109/OJSP.2024.3407662","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3407662","url":null,"abstract":"Human body motion segmentation plays a major role in many applications, ranging from computer vision to robotics. Among a variety of algorithms, graph-based approaches have demonstrated exciting potential in motion analysis owing to their power to capture the underlying correlations among joints. However, most existing works focus on simpler single-layer geometric structures, whereas multi-layer spatial-temporal graph structure can provide more informative results. To provide an interpretable analysis on multilayer spatial-temporal structures, we revisit the emerging field of multilayer graph signal processing (M-GSP), and propose novel approaches based on M-GSP to human motion segmentation. Specifically, we model the spatial-temporal relationships via multilayer graphs (MLG) and introduce M-GSP spectrum analysis for feature extraction. We present two different M-GSP based algorithms for unsupervised segmentation in the MLG spectrum and vertex domains, respectively. Our experimental results demonstrate the robustness and effectiveness of our proposed methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"934-947"},"PeriodicalIF":2.9,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptable L4S Congestion Control for Cloud-Based Real-Time Streaming Over 5G 面向基于云的 5G 实时流的可适应 L4S 拥塞控制
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-27 DOI: 10.1109/OJSP.2024.3405719
Jangwoo Son;Yago Sanchez;Cornelius Hellge;Thomas Schierl
{"title":"Adaptable L4S Congestion Control for Cloud-Based Real-Time Streaming Over 5G","authors":"Jangwoo Son;Yago Sanchez;Cornelius Hellge;Thomas Schierl","doi":"10.1109/OJSP.2024.3405719","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3405719","url":null,"abstract":"Achieving reliable low-latency streaming on real-time immersive services that require seamless interaction has been of increasing importance recently. To cope with such an immersive service requirement, IETF and 3GPP defined Low Latency, Low Loss, and Scalable Throughput (L4S) architecture and terminologies to enable delay-critical applications to achieve low congestion and scalable bitrate control over 5G. With low-latency applications in mind, this paper presents a cloud-based streaming system using WebRTC for real-time communication with an adaptable L4S congestion control (aL4S-CC). aL4S-CC is designed to prevent the target service from surpassing a required end-to-end latency. It is evaluated against existing congestion controls GCC and ScreamV2 across two configurations: 1) standard L4S (sL4S) which has no knowledge of Explicit Congestion Notification (ECN) marking scheme information; 2) conscious L4S (cL4S) which recognizes the ECN marking scheme information. The results show that aL4S-CC achieves high link utilization with low latency while maintaining good performance in terms of fairness, and cL4S improves sL4S's performance by having an efficient trade-off between link utilization and latency. In the entire simulation, the gain of link utilization on cL4S is 1.4%, 4%, and 17.9% on average compared to sL4S, GCC, and ScreamV2, respectively, and the ratio of duration exceeding the target queuing delay achieves the lowest values of 1% and 0.9% for cL4S and sL4S, respectively.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"841-849"},"PeriodicalIF":2.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contactless Skin Blood Perfusion Imaging via Multispectral Information, Spectral Unmixing and Multivariable Regression 通过多光谱信息、光谱解混和多变量回归进行非接触式皮肤血液灌注成像
IEEE open journal of signal processing Pub Date : 2024-03-26 DOI: 10.1109/OJSP.2024.3381892
Liliana Granados-Castro;Omar Gutierrez-Navarro;Aldo Rodrigo Mejia-Rodriguez;Daniel Ulises Campos-Delgado
{"title":"Contactless Skin Blood Perfusion Imaging via Multispectral Information, Spectral Unmixing and Multivariable Regression","authors":"Liliana Granados-Castro;Omar Gutierrez-Navarro;Aldo Rodrigo Mejia-Rodriguez;Daniel Ulises Campos-Delgado","doi":"10.1109/OJSP.2024.3381892","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3381892","url":null,"abstract":"Noninvasive methods for assessing in-vivo skin blood perfusion parameters, such as hemoglobin oxygenation, are crucial for diagnosing and monitoring microvascular diseases. This approach is particularly beneficial for patients with compromised skin, where standard contact-based clinical devices are inappropriate. For this goal, we propose the analysis of multimodal data from an occlusion protocol applied to 18 healthy participants, which includes multispectral imaging of the whole hand and reference photoplethysmography information from the thumb. Multispectral data analysis was conducted using two different blind linear unmixing methods: principal component analysis (PCA), and extended blind endmember and abundance extraction (EBEAE). Perfusion maps for oxygenated and deoxygenated hemoglobin changes in the hand were generated using linear multivariable regression models based on the unmixing methods. Our results showed high accuracy, with \u0000<inline-formula><tex-math>$text {R}^{2}$</tex-math></inline-formula>\u0000-adjusted values, up to 0.90 \u0000<inline-formula><tex-math>$pm$</tex-math></inline-formula>\u0000 0.08. Further analysis revealed that using more than four characteristic components during spectral unmixing did not improve the fit of the model. Bhattacharyya distance results showed that the fitted models with EBEAE were more sensitive to hemoglobin changes during occlusion stages, up to four times higher than PCA. Our study concludes that multispectral imaging with EBEAE is effective in quantifying changes in oxygenated hemoglobin levels, especially when using 3 to 4 characteristic components. Our proposed method holds promise for the noninvasive diagnosis and monitoring of superficial microvascular alterations across extensive anatomical regions.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"101-111"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10480236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140639430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech 轻量级、多扬声器、多语言 Indic 文本到语音技术
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-25 DOI: 10.1109/OJSP.2024.3379092
Abhayjeet Singh;Amala Nagireddi;Anjali Jayakumar;Deekshitha G;Jesuraja Bandekar;Roopa R;Sandhya Badiger;Sathvik Udupa;Saurabh Kumar;Prasanta Kumar Ghosh;Hema A Murthy;Heiga Zen;Pranaw Kumar;Kamal Kant;Amol Bole;Bira Chandra Singh;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich
{"title":"Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech","authors":"Abhayjeet Singh;Amala Nagireddi;Anjali Jayakumar;Deekshitha G;Jesuraja Bandekar;Roopa R;Sandhya Badiger;Sathvik Udupa;Saurabh Kumar;Prasanta Kumar Ghosh;Hema A Murthy;Heiga Zen;Pranaw Kumar;Kamal Kant;Amol Bole;Bira Chandra Singh;Keiichi Tokuda;Mark Hasegawa-Johnson;Philipp Olbrich","doi":"10.1109/OJSP.2024.3379092","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379092","url":null,"abstract":"The Lightweight, Multi-speaker, Multi-lingual Indic Text-to-Speech (LIMMITS'23) challenge is organized as part of the ICASSP 2023 Signal Processing Grand Challenge. LIMMITS'23 aims at the development of a lightweight, multi-speaker, multi-lingual Text to Speech (TTS) model using datasets in Marathi, Hindi, and Telugu, with at least 40 hours of data released for each of the male and female voice artists in each language. The challenge encourages the advancement of TTS in Indian Languages as well as the development of techniques involved in TTS data selection and model compression. The 3 tracks of LIMMITS'23 have provided an opportunity for various researchers and practitioners around the world to explore the state-of-the-art techniques in TTS research.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"790-798"},"PeriodicalIF":2.9,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10479171","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal Diffusion Models for Generalized Speech Enhancement 用于广义语音增强的因果扩散模型
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-19 DOI: 10.1109/OJSP.2024.3379070
Julius Richter;Simon Welker;Jean-Marie Lemercier;Bunlong Lay;Tal Peer;Timo Gerkmann
{"title":"Causal Diffusion Models for Generalized Speech Enhancement","authors":"Julius Richter;Simon Welker;Jean-Marie Lemercier;Bunlong Lay;Tal Peer;Timo Gerkmann","doi":"10.1109/OJSP.2024.3379070","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379070","url":null,"abstract":"In this work, we present a causal speech enhancement system that is designed to handle different types of corruptions. This paper is an extended version of our contribution to the “ICASSP 2023 Speech Signal Improvement Challenge”. The method is based on a generative diffusion model which has been shown to work well in scenarios beyond speech-in-noise, such as missing data and non-additive corruptions. We guarantee causal processing with an algorithmic latency of 20 ms by modifying the network architecture and removing non-causal normalization techniques. To train and test our model, we generate a new corrupted speech dataset which includes additive background noise, reverberation, clipping, packet loss, bandwidth reduction, and codec artifacts. We compare the causal and non-causal versions of our method to investigate the impact of causal processing and we assess the gap between specialized models trained on a particular corruption type and the generalized model trained on all corruptions. Although specialized models and non-causal models have a small advantage, we show that the generalized causal approach does not suffer from a significant performance penalty, while it can be flexibly employed for real-world applications where different types of distortions may occur.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"780-789"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475490","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Drone-vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results 2023 年国际航空科学与技术会议上的无人机与鸟类探测大挑战:方法和结果回顾
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-19 DOI: 10.1109/OJSP.2024.3379073
Angelo Coluccia;Alessio Fascista;Lars Sommer;Arne Schumann;Anastasios Dimou;Dimitrios Zarpalas
{"title":"The Drone-vs-Bird Detection Grand Challenge at ICASSP 2023: A Review of Methods and Results","authors":"Angelo Coluccia;Alessio Fascista;Lars Sommer;Arne Schumann;Anastasios Dimou;Dimitrios Zarpalas","doi":"10.1109/OJSP.2024.3379073","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3379073","url":null,"abstract":"This paper presents the 6th edition of the “Drone-vs-Bird” detection challenge, jointly organized with the WOSDETC workshop within the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023. The main objective of the challenge is to advance the current state-of-the-art in detecting the presence of one or more Unmanned Aerial Vehicles (UAVs) in real video scenes, while facing challenging conditions such as moving cameras, disturbing environmental factors, and the presence of birds flying in the foreground. For this purpose, a video dataset was provided for training the proposed solutions, and a separate test dataset was released a few days before the challenge deadline to assess their performance. The dataset has continually expanded over consecutive installments of the Drone-vs-Bird challenge and remains openly available to the research community, for non-commercial purposes. The challenge attracted novel signal processing solutions, mainly based on deep learning algorithms. The paper illustrates the results achieved by the teams that successfully participated in the 2023 challenge, offering a concise overview of the state-of-the-art in the field of drone detection using video signal processing. Additionally, the paper provides valuable insights into potential directions for future research, building upon the main pros and limitations of the solutions presented by the participating teams.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"766-779"},"PeriodicalIF":2.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10475518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks 利用深度神经网络解码连续语音的包络和频率跟随脑电图响应
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-18 DOI: 10.1109/OJSP.2024.3378593
Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach
{"title":"Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks","authors":"Mike D. Thornton;Danilo P. Mandic;Tobias J. Reichenbach","doi":"10.1109/OJSP.2024.3378593","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378593","url":null,"abstract":"The electroencephalogram (EEG) offers a non-invasive means by which a listener's auditory system may be monitored during continuous speech perception. Reliable auditory-EEG decoders could facilitate the objective diagnosis of hearing disorders, or find applications in cognitively-steered hearing aids. Previously, we developed decoders for the ICASSP Auditory EEG Signal Processing Grand Challenge (SPGC). These decoders placed first in the match-mismatch task: given a short temporal segment of EEG recordings, and two candidate speech segments, the task is to identify which of the two speech segments is temporally aligned, or matched, with the EEG segment. The decoders made use of cortical responses to the speech envelope, as well as speech-related frequency-following responses, to relate the EEG recordings to the speech stimuli. Here we comprehensively document the methods by which the decoders were developed. We extend our previous analysis by exploring the association between speaker characteristics (pitch and sex) and classification accuracy, and provide a full statistical analysis of the final performance of the decoders as evaluated on a heldout portion of the dataset. Finally, the generalisation capabilities of the decoders are characterised, by evaluating them using an entirely different dataset which contains EEG recorded under a variety of speech-listening conditions. The results show that the match-mismatch decoders achieve accurate and robust classification accuracies, and they can even serve as auditory attention decoders without additional training.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"700-716"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474145","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sea-Wave: Speech Envelope Reconstruction From Auditory EEG With an Adapted WaveNet 海浪:利用改编波网从听觉脑电图重建语音包络
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-18 DOI: 10.1109/OJSP.2024.3378594
Liuyin Yang;Bob Van Dyck;Marc M. Van Hulle
{"title":"Sea-Wave: Speech Envelope Reconstruction From Auditory EEG With an Adapted WaveNet","authors":"Liuyin Yang;Bob Van Dyck;Marc M. Van Hulle","doi":"10.1109/OJSP.2024.3378594","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378594","url":null,"abstract":"Speech envelope reconstruction from EEG is shown to bear clinical potential to assess speech intelligibility. Linear models are commonly used to this end, but they have recently been outperformed in reconstruction scores by non-linear deep neural networks, particularly by dilated convolutional networks. This study presents Sea-Wave, a WaveNet-based architecture for speech envelope reconstruction that outperforms the state-of-the-art model. Our model is an extension of our submission for the Auditory EEG Challenge of the ICASSP Signal Processing Grand Challenge 2023. We improve upon our prior work by evaluating model components and hyperparameters through an ablation study and hyperparameter search, respectively. Our best subject-independent model achieves a Pearson correlation of 22.58% on seen and 11.58% on unseen subjects. After subject-specific fine-tuning, we find an average relative improvement of 30% for the seen subjects and a Pearson correlation of 56.57% for the best seen subject.Finally, we explore several model visualizations to obtain a better understanding of the model, the differences across subjects and the EEG features that relate to auditory perception.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"686-699"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474194","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141448000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Overview of the ADReSS-M Signal Processing Grand Challenge on Multilingual Alzheimer's Dementia Recognition Through Spontaneous Speech 通过自发语音识别多语种阿尔茨海默氏症痴呆症的 ADReSS-M 信号处理大挑战概述
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-18 DOI: 10.1109/OJSP.2024.3378595
Saturnino Luz;Fasih Haider;Davida Fromm;Ioulietta Lazarou;Ioannis Kompatsiaris;Brian MacWhinney
{"title":"An Overview of the ADReSS-M Signal Processing Grand Challenge on Multilingual Alzheimer's Dementia Recognition Through Spontaneous Speech","authors":"Saturnino Luz;Fasih Haider;Davida Fromm;Ioulietta Lazarou;Ioannis Kompatsiaris;Brian MacWhinney","doi":"10.1109/OJSP.2024.3378595","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378595","url":null,"abstract":"The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"738-749"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICASSP 2023 Deep Noise Suppression Challenge ICASSP 2023 深度噪声抑制挑战赛
IF 2.9
IEEE open journal of signal processing Pub Date : 2024-03-18 DOI: 10.1109/OJSP.2024.3378602
Harishchandra Dubey;Ashkan Aazami;Vishak Gopal;Babak Naderi;Sebastian Braun;Ross Cutler;Alex Ju;Mehdi Zohourian;Min Tang;Mehrsa Golestaneh;Robert Aichner
{"title":"ICASSP 2023 Deep Noise Suppression Challenge","authors":"Harishchandra Dubey;Ashkan Aazami;Vishak Gopal;Babak Naderi;Sebastian Braun;Ross Cutler;Alex Ju;Mehdi Zohourian;Min Tang;Mehrsa Golestaneh;Robert Aichner","doi":"10.1109/OJSP.2024.3378602","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3378602","url":null,"abstract":"The ICASSP 2023 Deep Noise Suppression (DNS) Challenge marks the fifth edition of the DNS challenge series. DNS challenges were organized from 2019 to 2023 to foster research in the field of DNS. Previous DNS challenges were held at INTERSPEECH 2020, ICASSP 2021, INTERSPEECH 2021, and ICASSP 2022. This challenge aims to advance models capable of jointly addressing denoising, dereverberation, and interfering talker suppression, with separate tracks focusing on headset and speakerphone scenarios. The challenge facilitates personalized deep noise suppression by providing accompanying enrollment clips for each test clip, each containing the primary talker only, which can be used to compute a speaker identity feature and disentangle primary and interfering speech. While the majority of models submitted to the challenge were personalized, the same teams emerged as the winners in both tracks. The best models demonstrated improvements of 0.145 and 0.141 in the challenge's score, respectively, when compared to the noisy blind test set. We present additional analysis and draw comparisons to previous challenges.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"725-737"},"PeriodicalIF":2.9,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10474162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信