{"title":"Using Block Coordinate Descent to Learn Sparse Coding Dictionaries with a Matrix Norm Update","authors":"Bradley M. Whitaker, David V. Anderson","doi":"10.1109/ICASSP.2018.8461499","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461499","url":null,"abstract":"Researchers have recently examined a modified approach to sparse coding that encourages dictionaries to learn anomalous features. This is done by incorporating the matrix I-norm, or $ell_{1,infty}$ mixed matrix norm, into the dictionary update portion of a sparse coding algorithm. However, solving a matrix norm minimization problem in each iteration of the algorithm causes it to run more slowly. The purpose of this paper is to introduce block coordinate descent, a subgradient-like approach to minimizing the matrix norm, to the dictionary update. This approach removes the need to solve a convex optimization program in each iteration and dramatically reduces the time required to learn a dictionary. Importantly, the dictionary learned in this manner can still model anomalous features present in a dataset.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"89 1","pages":"2761-2765"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85790144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mutual-Information-Private Online Gradient Descent Algorithm","authors":"Ruochi Zhang, P. Venkitasubramaniam","doi":"10.1109/ICASSP.2018.8461756","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461756","url":null,"abstract":"A user implemented privacy preservation mechanism is proposed for the online gradient descent (OGD) algorithm. Privacy is measured through the information leakage as quantified by the mutual information between the users outputs and learners inputs. The input perturbation mechanism proposed can be implemented by individual users with a space and time complexity that is independent of the horizon T. For the proposed mechanism, the information leakage is shown to be bounded by the Gaussian channel capacity in the full information setting. The regret bound of the privacy preserving learning mechanism is identical to the non private OGD with only differing in constant factors.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"2077-2081"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84565451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Multi-Rater Gaussian Mixture Regression Incorporating Temporal Dependencies of Emotion Uncertainty Using Kalman Filters","authors":"T. Dang, V. Sethu, E. Ambikairajah","doi":"10.1109/ICASSP.2018.8461321","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461321","url":null,"abstract":"Predicting continuous emotion in terms of affective attributes has mainly been focused on hard labels, which ignored the ambiguity of recognizing certain emotions. This ambiguity may result in high inter-rater variability and in turn causes varying prediction uncertainty with time. Based on the assumption that temporal dependencies occur in the evolution of emotion uncertainty, this paper proposes a dynamic multi-rater Gaussian Mixture Regression (GMR), aiming to obtain the emotion uncertainty prediction reflected by multi-raters by taking into account their temporal dependencies. This framework is achieved by incorporating feedforward and backward Kalman filters into GMR to estimate the time-dependent label distribution that reflects the emotion uncertainty. It also provides the benefits of relaxing the label distribution of Gaussian assumption to that of a Gaussian Mixture Model (GMM). In addition, a new measurement to estimate emotion uncertainty from GMM as the local variability is adopted. Experiments conducted on the RECOLA database reveal that incorporating temporal dependencies is critical for emotion uncertainty prediction with 17% relative improvement for arousal, and that the proposed framework for emotion uncertainty prediction shows potential in conventional emotion attribute prediction.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"4929-4933"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88924565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Marchi, Stephen Shum, Kvuveon Hwang, S. Kajarekar, Siddharth Sigtia, H. Richards, R. Haynes, Yoon Kim, J. Bridle
{"title":"Generalised Discriminative Transform via Curriculum Learning for Speaker Recognition","authors":"E. Marchi, Stephen Shum, Kvuveon Hwang, S. Kajarekar, Siddharth Sigtia, H. Richards, R. Haynes, Yoon Kim, J. Bridle","doi":"10.1109/ICASSP.2018.8461296","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461296","url":null,"abstract":"In this paper we introduce a speaker verification system deployed on mobile devices that can be used to personalise a keyword spotter. We describe a baseline DNN system that maps an utterance to a speaker embedding, which is used to measure speaker differences via cosine similarity. We then introduce an architectural modification which uses an LSTM system where the parameters are optimised via a curriculum learning procedure to reduce the detection error and improve its generalisability across various conditions. Experiments on our internal datasets show that the proposed approach outperforms the DNN baseline system and yields a relative EER reduction of 30-70% on both text-dependent and text-independent tasks under a variety of acoustic conditions.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"5324-5328"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81946559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengyu Chen, Gauri Jagatap, Seyedehsara Nayer, C. Hegde, Namrata Vaswani
{"title":"Low Rank Fourier Ptychography","authors":"Zhengyu Chen, Gauri Jagatap, Seyedehsara Nayer, C. Hegde, Namrata Vaswani","doi":"10.1109/ICASSP.2018.8462480","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462480","url":null,"abstract":"In this paper, we introduce a principled algorithmic approach for Fourier ptychographic imaging of dynamic, time-varying targets. To the best of our knowledge, this setting has not been explicitly addressed in the ptychography literature. We argue that such a setting is very natural, and that our methods provide an important first step towards helping reduce the sample complexity (and hence acquisition time) of imaging dynamic scenes to managaeble levels. With significantly reduced acquisition times per image, it is conceivable that dynamic ptychographic imaging of fast changing scenes indeeed becomes practical in the near future.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"6538-6542"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75874080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMSE Adaptive Waveform Design for a MIMO Active Sensing System Tracking Multiple Moving Targets","authors":"Steven Herbert, J. Hopgood, B. Mulgrew","doi":"10.1109/ICASSP.2018.8462319","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462319","url":null,"abstract":"This paper proposes a method for minimum mean squared error (MMSE) adaptive waveform design (AWD) in multiple-input-multiple-output (MIMO) active sensing systems which are used to track moving targets. The method proposed herein prompts two computational improvements compared to a related method for static targets. Consideration of moving targets also introduces the possibility of ‘model mismatch’ between the actual motion of the targets, and the model available to the MMSE AWD system. Results show that the proposed method leads to an improvement in mean squared error performance of up to 29% compared to the non-adaptive case.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"150 1","pages":"3271-3275"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77382581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling","authors":"Rainer Kelz, G. Widmer","doi":"10.1109/ICASSP.2018.8461291","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461291","url":null,"abstract":"We measure the effect of small amounts of systematic and random label noise caused by slightly misaligned ground truth labels in a fine grained audio signal labeling task. The task we choose to demonstrate these effects on is also known as framewise polyphonic transcription or note quantized multi-fO estimation, and transforms a monaural audio signal into a sequence of note indicator labels. It will be shown that even slight misalignments have clearly apparent effects, demonstrating a great sensitivity of convolutional neural networks to label noise. The implications are clear: when using convolutional neural networks for fine grained audio signal labeling tasks, great care has to be taken to ensure that the annotations have precise timing, and are free from systematic or random error as much as possible - even small misalignments will have a noticeable impact.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"238 1","pages":"2996-3000"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76562385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Geometry of Mixtures of Prescribed Distributions","authors":"F. Nielsen, R. Nock","doi":"10.1109/ICASSP.2018.8461869","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461869","url":null,"abstract":"We consider the space of w-mixtures that are finite statistical mixtures sharing the same prescribed component distributions, like Gaussian mixture models sharing the same components. The information geometry induced by the Kullback-Leibler (KL) divergence yields a dually flat space where the KL divergence between two w-mixtures amounts to a Bregman divergence for the negative Shannon entropy generator, called the Shannon information. Furthermore, we prove that the skew Jensen-Shannon statistical divergence between w-mixtures amount to skew Jensen divergences on their parameters and state several divergence inequalities between w-mixtures and their closures.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"84 1","pages":"2861-2865"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86857127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-Negative Matrix Factorization","authors":"Hideaki Kagami, H. Kameoka, M. Yukawa","doi":"10.1109/ICASSP.2018.8462080","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462080","url":null,"abstract":"This paper proposes an extension of multichannel non-negative matrix factorization (MNMF) that simultaneously solves source separation and dereverberation. While MNMF was originally formulated under an underdetermined problem setting where sources can outnumber microphones, a determined counterpart of MNMF, which we call the determined MNMF (DMNMF), has recently been proposed with notable success. This approach is particularly notable in that the optimization process can be more than 30 times faster than the underdetermined version owing to the fact that it involves no matrix inversion computations. One drawback as regards all methods based on instantaneous mixture models, including MNMF, is that they are weak against long reverberation. To overcome this drawback, this paper proposes an extension of DMNMF using a frequency-domain convolutive mixture model. The optimization process of the proposed method consists of iteratively updating (i) the spectral parameters of each source using the majorization-minimization algorithm, (ii) the separation matrix using iterative projection, and (iii) the dereverberation filters using multichannel linear prediction. Experimental results showed that the proposed method yielded higher separation performance and dereverberation performance than the baseline method under highly reverberant environments.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"31-35"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73599280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdurhman Albasir, R. S. R. James, S. Naik, A. Nayak
{"title":"Using Deep Learning to Classify Power Consumption Signals of Wireless Devices: An Application to Cybersecurity","authors":"Abdurhman Albasir, R. S. R. James, S. Naik, A. Nayak","doi":"10.1109/ICASSP.2018.8461304","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461304","url":null,"abstract":"The problem of detecting malware in mobile devices is becoming increasingly important. While most of the mobile devices run on very limited resources, having anti-viruses installed on-board is not very practical, especially in IoT devices. Even if such tools exist, malware could hide or manipulate their fingerprint, making them not easy to detect. Thus, having effective countermeasures for after malware intrusion is paramount. In this work, we utilize deep learning ability to learn multiple levels of representations from raw data to classify power consumption signals obtained from smartphones. The objective is to build a framework that can intelligently tell if the smartphone has a malware or not by only monitoring its power consumption. Validation tests confirm that the proposed framework show that information contained in the measured power consumption of smartphones can in principle be used to identify malware existence and further can tell how active malware is with very high accuracy.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"134 1","pages":"2032-2036"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86334649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}