Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang
{"title":"Joint Modeling for ASR Correction and Dialog State Tracking","authors":"Deyuan Wang, Tiantian Zhang, Caixia Yuan, Xiaojie Wang","doi":"10.1109/ICASSP49357.2023.10095945","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095945","url":null,"abstract":"In spoken dialog system, transcription errors in Automated Speech Recognition (ASR) impact downstream task, especially dialog state tracking (DST). Approaches to alleviate such errors involve using richer information such as word-lattices and word confusion networks. However, in some cases, this information may not be easily obtained. In addition, the large pre-trained language model is trained on plain text, leading to the gap between spoken DST and original pretrained model. In this paper, we propose a multi-task method which performs DST jointly with ASR correction to improve the performance of both tasks. To do so, we build a MultiWOZ-ASR dataset containing ASR noise in DST and mitigate the gap by utilizing a multi-task pre-training framework. Moreover, curriculum learning is adopted to alleviate the phenomenon that the correction task is difficult to converge at the initial stage of pre-training. Experimental results show that our model achieves significant improvements on DSTC2 and MultiWOZ-ASR dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116109499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Interactivity and Heterogeneity for Sleep Stage Classification Via Heterogeneous Graph Neural Network","authors":"Ziyu Jia, Youfang Lin, Yuhan Zhou, Xiyang Cai, Peng Zheng, Qiang Li, Jing Wang","doi":"10.1109/ICASSP49357.2023.10095397","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095397","url":null,"abstract":"Sleep stage classification based on physiological time-series is essential for sleep quality evaluation and the diagnosis of sleep disorders in clinical practice. Existing machine learning studies have achieved adequate results in sleep stage classification. However, those methods neglect the significance of simultaneously capturing the interactivity and heterogeneity of physiological signals. In this paper, we propose a novel Sleep Heterogeneous Graph Neural Network (SleepHGNN) to employ these essential features. The SleepHGNN is a deep graph network consisting of Heterogeneous Graph Transformer layers, which are composed of a Heterogeneous Message Passing module for capturing the heterogeneity and a Target-Specific Aggregation module for capturing the interactivity of physiological signals. The experiments show that the SleepHGNN outperforms the state-of-the-art models on the sleep stage classification task. The source code of SleepHGNN is available at: https://github.com/zhouyh310/SleepHGNN.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116127033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Cartwright, Magdalena Fuentes, C. Mydlarz, Fabio Miranda, J. Bello
{"title":"Does a Quieter City Mean Fewer Complaints? The Sounds of New York City During Covid-19 Lockdown","authors":"M. Cartwright, Magdalena Fuentes, C. Mydlarz, Fabio Miranda, J. Bello","doi":"10.1109/ICASSP49357.2023.10094968","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094968","url":null,"abstract":"The COVID-19 pandemic had an unprecedented effect in human activity and city landscapes. A very notorious transformation during this period was the change in noise levels and patterns across cities. Small scale studies have show this change in noise levels across different locations in the globe. In this work, we extend these studies by using historical audio data from the SONYC sensor network deployed in New York City. We exploit machine listening models to understand not only noise levels but also patterns, by performing a sound source presence analysis. Finally, we contrast our finding from the acoustic data with noise complaints to better understand the relationship between noise and our perception of it.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116132299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
He Zhu, Ce Li, Haitian Yang, Yan Wang, Wei-Jung Huang
{"title":"Prompt Makes mask Language Models Better Adversarial Attackers","authors":"He Zhu, Ce Li, Haitian Yang, Yan Wang, Wei-Jung Huang","doi":"10.1109/ICASSP49357.2023.10095125","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095125","url":null,"abstract":"Generating high-quality synonymous perturbations is a core challenge for textual adversarial tasks. However, candidates generated from the masked language model often contain many words that are antonyms or irrelevant to the original words, which limit the perturbation space and affect the attack’s effectiveness. We present ProAttacker1 which uses Prompt to make the mask language models better adversarial Attackers. ProAttacker inverts the prompt paradigm by leveraging the prompt with the class label to guide the language model to generate more semantically-consistent perturbations. We present a systematic evaluation to analyze the attack performance on 6 NLP datasets, covering text classification and inference. Our experiments demonstrate that ProAttacker outperforms state-of-the-art attack strategies in both success rate and perturb rate.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122327141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neurally Augmented State Space Model for Simultaneous Communication and Tracking with Low Complexity Receivers","authors":"F. Pedraza, G. Caire","doi":"10.1109/ICASSP49357.2023.10095824","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095824","url":null,"abstract":"In this paper, we propose an integrated sensing and communications (ISAC) system where a base station (BS) equipped with an antenna array and a co-located radar receiver transmits data packets while simultaneously tracking the position of users. We restrict our attention to the simplest hardware architecture, where the beamforming array can generate beams from a discrete codebook and the receiver is equipped with a single analog to digital converter, thereby allowing for scalaronly measurements where angular information is lost. Under such restrictive constraints, the observation likelihoods are hard to model, which motivates us to learn them via neural networks. This learned likelihoods are then incorporated into a state space model where Bayesian filtering can be performed. We test our method in complicated road geometries and show that our tracker is capable of following high mobility users most of the time. Furthermore, when the track of a user is lost, it often takes only a few measurements until is is recovered, disposing of the need for time consuming beam alignment procedures.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122948786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothée Dhaussy, B. Jabaian, F. Lefèvre, R. Horaud
{"title":"Audio-Visual Speaker Diarization in the Framework of Multi-User Human-Robot Interaction","authors":"Timothée Dhaussy, B. Jabaian, F. Lefèvre, R. Horaud","doi":"10.1109/ICASSP49357.2023.10096295","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096295","url":null,"abstract":"The speaker diarization task answers the question \"who is speaking at a given time?\". It represents valuable information for scene analysis in a domain such as robotics. In this paper, we introduce a temporal audio-visual fusion model for multiusers speaker diarization, with low computing requirement, a good robustness and an absence of training phase. The proposed method identifies the dominant speakers and tracks them over time by measuring the spatial coincidence between sound locations and visual presence. The model is generative, parameters are estimated online, and does not require training. Its effectiveness was assessed using two datasets, a public one and one collected in-house with the Pepper humanoid robot.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122951269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aref Miri Rekavandi, A. Seghouane, F. Boussaid, Bennamoun
{"title":"Extended Expectation Maximization for Under-Fitted Models","authors":"Aref Miri Rekavandi, A. Seghouane, F. Boussaid, Bennamoun","doi":"10.1109/ICASSP49357.2023.10095526","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095526","url":null,"abstract":"In this paper, we generalize the well-known Expectation Maximization (EM) algorithm using the α−divergence for Gaussian Mixture Model (GMM). This approach is used in robust subspace detection when the number of parameters is kept small to avoid overfitting and large estimation variances. The level of robustness can be tuned by the parameter α. When α → 1, our method is equivalent to the standard EM approach and for α < 1 the method is robust against potential outliers. Simulation results show that the method outperforms the standard EM when it comes to mismatches between noise models and their realizations. In addition, we use the proposed method to detect active brain areas using collected functional Magnetic Resonance Imaging (fMRI) data during task-related experiments.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122621621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Raei, M. Alaee-Kerahroodi, B. Shankar, B. Ottersten
{"title":"Range-ISL Minimization and Spectral Shaping in MIMO Radar Systems via Waveform Design","authors":"E. Raei, M. Alaee-Kerahroodi, B. Shankar, B. Ottersten","doi":"10.1109/ICASSP49357.2023.10096518","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096518","url":null,"abstract":"In this paper, we look at a waveform design problem for colocated Multiple-Input Multiple-Output (MIMO) radar systems. Under continuous phase constraint, we aim to minimize the range-Integrated Sidelobe Level (ISL) with a compatible spectral response. In this regard, we define the range-ISL function in the time domain first, and then express it in the frequency domain using the Parseval relation. Following that, we incorporate weights on the range-ISL in the frequency domain to apply spectral compatibility. As a result, we have a multi-variable, non-convex, NP-hard optimization problem. We proposed an iterative algorithm based on the Coordinate Descent (CD) method to obtain a local optimum solution. We show the performance of the proposed method and compare it to the counterparts in the numerical results.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122653214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonnegative Block-Term Decomposition with the β-Divergence: Joint Data Fusion and Blind Spectral Unmixing","authors":"Clémence Prévost, Valentin Leplat","doi":"10.1109/ICASSP49357.2023.10096100","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096100","url":null,"abstract":"We present a new method for solving simultaneously hyperspectral super-resolution and spectral unmixing of the unknown super-resolution image. Our method relies on three key elements: (1) the nonnegative decomposition in rank-(Lr, Lr, 1) block-terms, (2) joint tensor factorization with multiplicative updates, and (3) the formulation of a family of optimization problems with β-divergences objective functions. We come up with a family of simple, robust and efficient algorithms, adaptable to various noise statistics. Experiments show that our approach competes favorably with state-of-the-art methods for solving both problems at hand for various noise statistics.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122983018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaechang Kim, Yunjoo Lee, Hyun Mi Cho, Dong Woo Kim, Chi Hoon Song, Jungseul Ok
{"title":"Activity-Informed Industrial Audio Anomaly Detection Via Source Separation","authors":"Jaechang Kim, Yunjoo Lee, Hyun Mi Cho, Dong Woo Kim, Chi Hoon Song, Jungseul Ok","doi":"10.1109/ICASSP49357.2023.10095113","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095113","url":null,"abstract":"We discuss a practical scenario of anomaly detection for industrial sound data where the sound of a target machine is corrupted by not only noise from plant environments but also interference from neighboring machines. This is particularly challenging since the interfering sounds are virtually indistinguishable from the target machine without additional information. To overcome these challenges, we fully exploit the information of machine activity or control that is easy to obtain in the industrial environment, and propose a framework of source separation (SS) followed by anomaly detection (AD), so called SSAD. We note that the proposed SSAD utilizes the activity information for not only AD but also SS. In our experiment based on industrial dataset, we demonstrate that the proposed method using only mixture signal and activity information achieves comparable accuracy with an oracle baseline using clean source signals.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114053200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}