Mubarak G. Abdu-Aguye, W. Gomaa, Yasushi Makihara, Y. Yagi
{"title":"Detecting Adversarial Attacks In Time-Series Data","authors":"Mubarak G. Abdu-Aguye, W. Gomaa, Yasushi Makihara, Y. Yagi","doi":"10.1109/ICASSP40776.2020.9053311","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053311","url":null,"abstract":"In recent times, deep neural networks have seen increased adoption in highly critical tasks. They are also susceptible to adversarial attacks, which are specifically crafted changes made to input samples which lead to erroneous output from such models. Such attacks have been shown to affect different types of data such as images and more recently, time-series data. Such susceptibility could have catastrophic consequences, depending on the domain.We propose a method for detecting Fast Gradient Sign Method (FGSM) and Basic Iterative Method (BIM) adversarial attacks as adapted for time-series data. We frame the problem as an instance of outlier detection and construct a normalcy model based on information and chaos-theoretic measures, which can then be used to determine whether unseen samples are normal or adversarial. Our approach shows promising performance on several datasets from the 2015 UCR Time Series Archive, reaching up to 97% detection accuracy in the best case.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"3092-3096"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87766345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Channel Charting: an Euclidean Distance Matrix Completion Perspective","authors":"Patrick Agostini, Z. Utkovski, S. Stańczak","doi":"10.1109/ICASSP40776.2020.9053639","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053639","url":null,"abstract":"Channel charting (CC) is an emerging machine learning framework that aims at learning lower-dimensional representations of the radio geometry from collected channel state information (CSI) in an area of interest, such that spatial relations of the representations in the different domains are preserved. Extracting features capable of correctly representing spatial properties between positions is crucial for learning reliable channel charts. Most approaches to CC in the literature rely on range distance estimates, which have the drawback that they only provide accurate distance information for colinear positions. Distances between positions with large azimuth separation are constantly underestimated using these approaches, and thus incorrectly mapped to close neighborhoods. In this paper, we introduce a correlation matrix distance (CMD) based dissimilarity measure for CC that allows us to group CSI measurements according to their co-linearity. This provides us with the capability to discard points for which large distance errors are made, and to build a neighborhood graph between approximately collinear positions. The neighborhood graph allows us to state the problem of CC as an instance of an Euclidean distance matrix completion (EDMC) problem where side-information can be naturally introduced via convex box-constraints.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"5010-5014"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88311828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast and Accurate Frequent Directions Algorithm for Low Rank Approximation via Block Krylov Iteration","authors":"Qianxin Yi, Chenhao Wang, Xiuwu Liao, Yao Wang","doi":"10.1109/ICASSP40776.2020.9054022","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054022","url":null,"abstract":"It is known that frequent directions (FD) is a popular deterministic matrix sketching technique for low rank approximation. However, FD and its randomized variants usually meet high computational cost or computational instability in dealing with large-scale datasets, which limits their use in practice. To remedy such issues, this paper aims at improving the efficiency and effectiveness of FD. Specifically, by utilizing the power of Block Krylov Iteration and count sketch techniques, we propose a fast and accurate FD algorithm dubbed as BKICS-FD. We derive the error bound of the proposed BKICS-FD and then carry out extensive numerical experiments to illustrate its superiority over several popular FD algorithms, both in terms of computational speed and accuracy.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"3167-3171"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86368369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Lattice Search for Speech Recognition","authors":"Rao Ma, Hao Li, Qi Liu, Lu Chen, Kai Yu","doi":"10.1109/ICASSP40776.2020.9054109","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054109","url":null,"abstract":"To improve the accuracy of automatic speech recognition, a two-pass decoding strategy is widely adopted. The first-pass model generates compact word lattices, which are utilized by the second-pass model to perform rescoring. Currently, the most popular rescoring methods are N-best rescoring and lattice rescoring with long short-term memory language models (LSTMLMs). However, these methods encounter the problem of limited search space or inconsistency between training and evaluation. In this paper, we address these problems with an end-to-end model for accurately extracting the best hypothesis from the word lattice. Our model is composed of a bidirectional LatticeLSTM encoder followed by an attentional LSTM decoder. The model takes word lattice as input and generates the single best hypothesis from the given lattice space. When combined with an LSTMLM, the proposed model yields 9.7% and 7.5% relative WER reduction compared to N-best rescoring methods and lattice rescoring methods within the same amount of decoding time.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"7794-7798"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86380228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martha Yifiru Tachbelie, Ayimunishagu Abulimiti, S. Abate, Tanja Schultz
{"title":"DNN-Based Speech Recognition for Globalphone Languages","authors":"Martha Yifiru Tachbelie, Ayimunishagu Abulimiti, S. Abate, Tanja Schultz","doi":"10.1109/ICASSP40776.2020.9053144","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053144","url":null,"abstract":"This paper describes new reference benchmark results based on hybrid Hidden Markov Model and Deep Neural Networks (HMM-DNN) for the GlobalPhone (GP) multilingual text and speech database. GP is a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in more than 20 languages. Moreover, we provide new results for five additional languages, namely, Amharic, Oromo, Tigrigna, Wolaytta, and Uyghur. Across the 22 languages considered, the hybrid HMM-DNN models outperform the HMM-GMM based models regardless of the size of the training speech used. Overall, we achieved relative improvements that range from 7.14% to 59.43%.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"8269-8273"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86490635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
X. Zhong, Oubo Gong, Wenxin Huang, Jingling Yuan, Bo Ma, R. W. Liu
{"title":"Multi-Scale Residual Network for Image Classification","authors":"X. Zhong, Oubo Gong, Wenxin Huang, Jingling Yuan, Bo Ma, R. W. Liu","doi":"10.1109/ICASSP40776.2020.9053478","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053478","url":null,"abstract":"Multi-scale approach representing image objects at various levels-of-details has been applied to various computer vision tasks. Existing image classification approaches place more emphasis on multi-scale convolution kernels, and overlook multi-scale feature maps. As such, some shallower information of the network will not be fully utilized. In this paper, we propose the Multi-Scale Residual (MSR) module that integrates multi-scale feature maps of the underlying information to the last layer of Convolutional Neural Network. Our proposed method significantly enhances the characteristics of the information in the final classification. Extensive experiments conducted on CIFAR100, Tiny-ImageNet and large-scale CalTech-256 datasets demonstrate the effectiveness of our method compared with Res-Family.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"2023-2027"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86528753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Time-Frequency Network with Channel Attention and Non-Local Modules for Artificial Bandwidth Extension","authors":"Yuanjie Dong, Yaxing Li, Xiaoqi Li, Shanjie Xu, Dan Wang, Zhihui Zhang, Shengwu Xiong","doi":"10.1109/ICASSP40776.2020.9053769","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053769","url":null,"abstract":"Convolution neural networks (CNNs) have been achieving increasing attention for the artificial bandwidth extension (ABE) task recently. However, these methods use the flipped low-frequency phase to reconstruct speech signals, which may lead to the well-known invalid short-time Fourier Transform (STFT) problem. The convolutional operations only enable networks to construct informative features by fusing both channel-wise and spatial information within local receptive fields at each layer. In this paper, we introduce a Time-Frequency Network (TFNet) with channel attention (CA) and non-local (NL) modules for ABE. The TFNet exploits the information from both time and frequency domain branches concurrently to avoid the invalid STFT problem. To capture the channels and space dependencies, we incorporate the CA and NL modules to construct a proposed fully convolutional neural network for the time and frequency branches of TFNet. Experimental results demonstrate that the proposed method outperforms the competing method.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"6954-6958"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83671092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengshun Hu, Liang Liao, Jing Xiao, Lin Gu, S. Satoh
{"title":"Motion Feedback Design for Video Frame Interpolation","authors":"Mengshun Hu, Liang Liao, Jing Xiao, Lin Gu, S. Satoh","doi":"10.1109/ICASSP40776.2020.9053223","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053223","url":null,"abstract":"This paper introduces a feedback-based approach to interpolate video frames involving small and fast-moving objects. Unlike the existing feedforward-based methods that estimate optical flow and synthesize in-between frames sequentially, we introduce a motion-oriented component that adds a feedback block to the existing multi-scale autoencoder pipeline, which feedbacks information of small objects shared between architectures of two different scales. We show that feeding this additional information enables more robust detection of optical flow caused by small objects in fast motion. Using experiments on various datasets, we show that the feedback mechanism allows our method to achieve state-of-the-art results, both qualitatively and quantitatively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2016 1","pages":"4347-4351"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82628509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Channel Pruning For Correlation Filter Based Object Tracking","authors":"Goutam Yelluru Gopal, Maria A. Amer","doi":"10.1109/ICASSP40776.2020.9053333","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053333","url":null,"abstract":"Fusion of multi-channel representations has played a crucial role in the success of correlation filter (CF) based trackers. But, all channels do not contain useful information for target localization at every frame. During challenging scenarios, ambiguous responses of non-discriminative or unreliable channels lead to erroneous results and cause tracker drift. To mitigate this problem, we propose a method for dynamic channel pruning through online (i.e., at every frame) learning of channel weights. Our method uses estimated reliability scores to compute channel weights, to nullify the impact of highly unreliable channels. The proposed method for learning of channel weights is modeled as a non-smooth convex optimization problem. We then propose an algorithm to solve the resulting problem efficiently compared to off-the-shelf solvers. Results on VOT2018 and TC128 datasets show that proposed method improves the performance of baseline CF trackers.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"172 1","pages":"5700-5704"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82937761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deriving Compact Feature Representations Via Annealed Contraction","authors":"Muhammad A Shah, B. Raj","doi":"10.1109/ICASSP40776.2020.9054527","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054527","url":null,"abstract":"It is common practice to use pretrained image recognition models to compute feature representations for visual data. The size of the feature representations can have a noticeable impact on the complexity of the models that use these representations, and by extension on their deployablity and scalability. Therefore it would be beneficial to have compact visual representations that carry as much information as their high-dimensional counterparts. To this end we propose a technique that shrinks a layer by an iterative process in which neurons are removed from the and network is fine tuned. Using this technique we are able to remove 99% of the neurons from the penultimate layer of AlexNet and VGG16, while suffering less than 5% drop in accuracy on CIFAR10, Caltech101 and Caltech256. We also show that our method can reduce the size of AlexNet by 95% while only suffering a 4% reduction in accuracy on Caltech101.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"70 1","pages":"2068-2072"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88956095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}