{"title":"Adversarial Generative Distance-Based Classifier for Robust Out-of-Domain Detection","authors":"Zhiyuan Zeng, Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran Xu","doi":"10.1109/ICASSP39728.2021.9413908","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413908","url":null,"abstract":"Detecting out-of-domain (OOD) intents is critical in a task-oriented dialog system. Existing methods rely heavily on extensive manually labeled OOD samples and lack robustness. In this paper, we propose an efficient adversarial attack mechanism to augment hard OOD samples and design a novel generative distance-based classifier to detect OOD samples instead of a traditional threshold-based discriminator classifier. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers","authors":"Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du","doi":"10.1109/ICASSP39728.2021.9414018","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414018","url":null,"abstract":"This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115426716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Technique for OFDM Symbol Slicing","authors":"A. Pérez-Neira, M. Lagunas","doi":"10.1109/ICASSP39728.2021.9414504","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414504","url":null,"abstract":"This work presents an orthonormal transform that splits the Orthogonal Frequency Division Multiplex (OFDM) symbol into slices with ranked rate and decoding complexity. The advantage over the existing carrier or time segmentation is that the proposed technique does not depend on the frequency channel to produce slices of equal rate. Also, the encoding and the decoding complexity is kept simple.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115711064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoqing Jia, Jie Wang, Yongliang Liu, Xiangui Kang, Yun-Qing Shi
{"title":"A Layered Embedding-Based Scheme to Cope with Intra-Frame Distortion Drift In IPM-Based HEVC Steganography","authors":"Xiaoqing Jia, Jie Wang, Yongliang Liu, Xiangui Kang, Yun-Qing Shi","doi":"10.1109/ICASSP39728.2021.9413728","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413728","url":null,"abstract":"The spatial correlation of the intra-frame prediction units brings great challenges when minimizing embedding distortions using syndrome-trellis coding (STC) in High Efficiency Video Coding (HEVC) steganography. To solve this problem, we propose a layered embedding scheme which embeds information into the intra-prediction modes (IPMs) of 4×4 intra-frame prediction units (PUs) in HEVC. Firstly we divide the PUs of the intra-frame into different layers using Hasse diagram and make modification decisions for PUs in each layer respectively to decorrelate the correlated PUs. Secondly we make a statistics on more than 100,000 sampling PU pairs to quantitatively analyze the impacts between the distortions of PUs and then design a distortion function which takes mutual impacts of PUs into account. Experimental results show that our method can significantly reduce the embedding distortion and improve the security compared with the existing STC-based steganography methods embedding in IPMs.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114655037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingtian Feng, Feng Qian, Xin He, Yuqi Fan, H. Cai, Guangmin Hu
{"title":"Transitive Transfer Sparse Coding for Distant Domain","authors":"Lingtian Feng, Feng Qian, Xin He, Yuqi Fan, H. Cai, Guangmin Hu","doi":"10.1109/ICASSP39728.2021.9415021","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415021","url":null,"abstract":"The transfer learning between the source and target domain has already achieved significant success in machine learning areas. However, the existing methods can not achieve satisfactory result when solving the two distant domains transfer learning problem. In the worst case, it could lead to the negative transfer. In this paper, we propose a novel framework called transitive transfer sparse coding (TTSC) to solve the two distant domains transfer learning problem. On the one hand, as an extension of the sparse coding, the TTSC framework constructs a robust and high-level dictionary across three different domains and simultaneously obtains three good feature sparse representations. On the other hand, TTSC utilizes the intermediate domain as a strong bridge to transfer valuable knowledge between the source domain and target domain. Empirical studies validated that the TTSC framework significantly could outperform state-of-the-art methods.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Study on Task-Oriented Dialogue Translation","authors":"Siyou Liu","doi":"10.1109/ICASSP39728.2021.9413521","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413521","url":null,"abstract":"Translating conversational text, in particular task-oriented dialogues, is an important application task for machine translation technology. However, it has so far not been extensively explored due to its inherent characteristics including data limitation, discourse, informality and personality. In this paper, we systematically investigate advanced models on the task-oriented dialogue translation task, including sentence-level, document-level and non-autoregressive NMT models. Be-sides, we explore existing techniques such as data selection, back/forward translation, larger batch learning, finetuning and domain adaptation. To alleviate low-resource problem, we transfer general knowledge from four different pre-training models to the downstream task. Encouragingly, we find that the best model with mBART pre-training pushes the SOTA performance on WMT20 English-German and IWSLT DIALOG Chinese-English datasets up to 62.67 and 23.21 BLEU points, respectively.1","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121792186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Acerbi, R. Malvermi, Mirco Pezzoli, F. Antonacci, A. Sarti, R. Corradi
{"title":"Interpolation of Irregularly Sampled Frequency Response Functions Using Convolutional Neural Networks","authors":"M. Acerbi, R. Malvermi, Mirco Pezzoli, F. Antonacci, A. Sarti, R. Corradi","doi":"10.1109/ICASSP39728.2021.9413458","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413458","url":null,"abstract":"In the field of structural mechanics, classical methods for the vibrational characterization of objects exploit the inherent redundancy of a relevant amount of measurements acquired over regular sampling grids. However, there are cases in which parts of the objects under analysis are not accessible with sensors, leading to irregular sampling grids characterized by holes. Recent works have proved the benefits of adding prior knowledge in these scenarios, either through the definition of a suitable decomposition or using Finite Element modelling. In this paper we propose to use Convolutional Autoencoders (CA) for Frequency Response Function (FRF) interpolation from grids with different subsampling schemes. CA learn a compressed representation from a dataset of FRFs synthetized through Finite Element Analysis. Experiments with numerical and experimental data show the effectiveness of the model with a different amount of missing data and its ability to predict real FRFs characterized by different damping and sampling frequency.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116633706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporate Maximum Mean Discrepancy in Recurrent Latent Space for Sequential Generative Model","authors":"Yuchi Zhang, Yongliang Wang, Yang Dong","doi":"10.1109/ICASSP39728.2021.9414580","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414580","url":null,"abstract":"Stochastic recurrent neural networks have shown promising performance for modeling complex sequences. Nonetheless, existing methods adopt KL divergence as distribution regularizations in their latent spaces, which limits the choices of models for latent distribution construction. In this paper, we incorporate maximum mean discrepancy in the recurrent structure for distribution regularization. Maximum mean discrepancy is able to measure the difference between two distributions by just sampling from them, which enables us to construct more complicated latent distributions by neural networks. Therefore, our proposed algorithm is able to model more complex sequences. Experiments conducted on two different sequential modeling tasks show that our method outperforms the state-of-the-art sequential modeling algorithms.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116928570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neeraj Gaur, B. Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno, Manasa Prasad, B. Ramabhadran, Yun Zhu
{"title":"Mixture of Informed Experts for Multilingual Speech Recognition","authors":"Neeraj Gaur, B. Farris, Parisa Haghani, Isabel Leal, Pedro J. Moreno, Manasa Prasad, B. Ramabhadran, Yun Zhu","doi":"10.1109/ICASSP39728.2021.9414379","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414379","url":null,"abstract":"When trained on related or low-resource languages, multilingual speech recognition models often outperform their monolingual counterparts. However, these models can suffer from loss in performance for high resource or unrelated languages. We investigate the use of a mixture-of-experts approach to assign per-language parameters in the model to increase network capacity in a structured fashion. We introduce a novel variant of this approach, ‘informed experts’, which attempts to tackle inter-task conflicts by eliminating gradients from other tasks in these task-specific parameters. We conduct experiments on a real-world task with English, French and four dialects of Arabic to show the effectiveness of our approach. Our model matches or outperforms the monolingual models for almost all languages, with gains of as much as 31% relative. Our model also outperforms the baseline multilingual model for all languages by up to 9% relative.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120936889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Yao, P. Wang, K. Berntorp, Hassan Mansour, P. Boufounos
{"title":"Extended Object Tracking With Automotive Radar Using B-Spline Chained Ellipses Model","authors":"G. Yao, P. Wang, K. Berntorp, Hassan Mansour, P. Boufounos","doi":"10.1109/ICASSP39728.2021.9415080","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415080","url":null,"abstract":"This paper introduces a B-spline chained ellipses model representation for extended object tracking (EOT) using high-resolution automotive radar measurements. With offline automotive radar training datasets, the proposed model parameters are learned using the expectation-maximization (EM) algorithm. Then the probabilistic multi-hypothesis tracking (PMHT) along with the unscented transform (UT) is proposed to deal with the nonlinear forward-warping coordinate transformation, the measurement-to-ellipsis association, and the state update step. Numerical validation is provided to verify the effectiveness of the proposed EOT framework with automotive radar measurements.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121107666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}