{"title":"Noisy objective functions based on the f-divergence","authors":"M. Nußbaum-Thom, R. Schlüter, V. Goel, H. Ney","doi":"10.1109/ICASSP.2017.7952572","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952572","url":null,"abstract":"Dropout, the random dropping out of activations according to a specified rate, is a very simple but effective method to avoid over-fitting of deep neural networks to the training data.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115008221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Giuliani, L. Brayda, Sara Sansalone, S. Repetto, M. Ricchetti
{"title":"Evaluation of a complementary hearing aid for spatial sound segregation","authors":"Luca Giuliani, L. Brayda, Sara Sansalone, S. Repetto, M. Ricchetti","doi":"10.1109/ICASSP.2017.7952150","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952150","url":null,"abstract":"Spatial segregation of sounds is a common and simple task for healthy hearing people. Unfortunately, people who suffer from partial hearing loss have great troubles in separating sound sources in crowded and noisy environments. Social isolation is the most common consequence and hearing aids are not a solution, especially in severe noisy conditions, because of their limited directionality.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121486405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gopalakrishnan, T. Moy, Upamanyu Madhow, N. Verma
{"title":"Compressive information acquisition with hardware impairments and constraints: A case study","authors":"S. Gopalakrishnan, T. Moy, Upamanyu Madhow, N. Verma","doi":"10.1109/ICASSP.2017.7953323","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953323","url":null,"abstract":"Compressive information acquisition is a natural approach for low-power hardware front ends, since most natural signals are sparse in some basis. Key design questions include the impact of hardware impairments (e.g., nonlinearities) and constraints (e.g., spatially localized computations) on the fidelity of information acquisition. Our goal in this paper is to obtain specific insights into such issues through modeling of a Large Area Electronics (LAE)-based image acquisition system. We show that compressive information acquisition is robust to stochastic nonlinearities, and that appropriately designed spatially localized computations are effective, by evaluating the performance of reconstruction and classification based on the information acquired.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114040262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcello Tomasini, Katrina Smart, R. Menezes, M. Bush, Eraldo Ribeiro
{"title":"Automated robust Anuran classification by extracting elliptical feature pairs from audio spectrograms","authors":"Marcello Tomasini, Katrina Smart, R. Menezes, M. Bush, Eraldo Ribeiro","doi":"10.1109/ICASSP.2017.7952610","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952610","url":null,"abstract":"Ecologists can assess the health of wetlands by monitoring populations of animals such as Anurans (i.e., frogs and toads), which are sensitive to habitat changes. But, surveying anurans requires trained experts to identify species from the animals' mating calls. This identification task can be streamlined by automation. To this end, we propose an automatic frog-call classification algorithm and a smartphone application that drastically simplify the monitoring of anuran populations. We offer three main contributions. First, we introduce a classification method that has an average accuracy of 86% on a dataset of 736 calls from 48 anuran species from the United States. Our dataset is much larger and diverse than those of previous works on anuran classification. Second, we extract a new type of spectrogram feature that avoids syllable segmentation and the manual cleaning of the recordings. Our method also works with recordings of variable length. Third, our method uses GPS location and a voting scheme to reliably deal with a large number of species and high levels of noise.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131012217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised feature extraction for hyperspectral images using combined low rank representation and locally linear embedding","authors":"Mengdi Wang, Jing Yu, Lijuan Niu, Weidong Sun","doi":"10.1109/ICASSP.2017.7952392","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952392","url":null,"abstract":"Hyperspectral images(HSIs) provide hundreds of narrow spectral bands for the land-covers, thus can provide more powerful discriminative information for the land-cover classification. However, HSIs suffer from the curse of high dimensionality, therefore dimension reduction and feature extraction are essential for the application of HSIs. In this paper, we propose an unsupervised feature extraction method for HSIs using combined low rank representation and locally linear embedding (LRR LLE). The proposed method can simultaneously use both the spectral and spatial correlation within HSIs, with LRR modelling the intrinsic property of union of low-rank subspaces and LLE considering the correlation within spatial neighbours. Experiments are conducted on real HSI datasets and the classification results demonstrate that the features extracted by LRR LLE are more discriminative than the state-of-art methods.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125141769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training","authors":"Chao Zhang, P. Woodland","doi":"10.1109/ICASSP.2017.7953111","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953111","url":null,"abstract":"The use of deep neural networks (DNNs) for feature extraction and Gaussian mixture models (GMMs) for acoustic modelling is often termed a tandem system configuration and can be viewed as a Gaussian mixture density neural network (MDNN). Compared to the direct use of DNN output probabilities in the acoustic model, the tandem approach suffers from a major weakness in that the feature extraction stage and the final acoustic models are optimised separately. This paper proposes a joint optimisation approach to all the stages of the tandem acoustic model by using MDNN discriminative sequence training. A set of techniques is used to improve the training performance and stability. Experiments using the multi-genre broadcast (MGB) English data show that the proposed method produced a 6% relative lower word error rate (WER) than that of a traditional discriminatively trained tandem system. The resulting jointly optimised tandem systems are comparable in WER to hybrid DNN systems optimised using discriminative sequence training with the same number of parameters.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125168310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recovery of sparse signals via Branch and Bound Least-Squares","authors":"Abolfazl Hashemi, H. Vikalo","doi":"10.1109/ICASSP.2017.7953060","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953060","url":null,"abstract":"We present an algorithm, referred to as Branch and Bound Least-Squares (BBLS), for the recovery of sparse signals from a few linear combinations of their entries. Sparse signal reconstruction is readily cast as the problem of finding a sparse solution to an underdetermined system of linear equations. To solve it, BBLS employs an efficient search strategy of traversing a tree whose nodes represent the columns of the coefficient matrix and selects a subset of those columns by relying on Orthogonal Least-Squares (OLS) procedure. We state sufficient conditions under which in noise-free settings BBLS with high probability constructs a tree path which corresponds to the true support of the unknown sparse signal. Moreover, we empirically demonstrate that BBLS provides performance superior to that of existing algorithms in terms of accuracy, running time, or both. In the scenarios where the columns of the coefficient matrix are characterized by high correlation, BBLS is particularly beneficial and significantly outperforms existing methods.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125346789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building recurrent networks by unfolding iterative thresholding for sequential sparse recovery","authors":"Scott Wisdom, Thomas Powers, J. Pitton, L. Atlas","doi":"10.1109/ICASSP.2017.7952977","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952977","url":null,"abstract":"Historically, sparse methods and neural networks, particularly modern deep learning methods, have been relatively disparate areas. Sparse methods are typically used for signal enhancement, compression, and recovery, usually in an unsupervised framework, while neural networks commonly rely on a supervised training set. In this paper, we use the specific problem of sequential sparse recovery, which models a sequence of observations over time using a sequence of sparse coefficients, to show how algorithms for sparse modeling can be combined with supervised deep learning to improve sparse recovery. Specifically, we show that the iterative soft-thresholding algorithm (ISTA) for sequential sparse recovery corresponds to a stacked recurrent neural network (RNN) under specific architecture and parameter constraints. Then we demonstrate the benefit of training this RNN with backpropagation using supervised data for the task of column-wise compressive sensing of images. This training corresponds to adaptation of the original iterative thresholding algorithm and its parameters. Thus, we show by example that sparse modeling can provide a rich source of principled and structured deep network architectures that can be trained to improve performance on specific tasks.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126575561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying compensation techniques on i-vectors extracted from short-test utterances for speaker verification using deep neural network","authors":"Il-Ho Yang, Hee-Soo Heo, Sung-Hyun Yoon, Ha-jin Yu","doi":"10.1109/ICASSP.2017.7953206","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7953206","url":null,"abstract":"We propose a method to improve speaker verification performance when a test utterance is very short. In some situations with short test utterances, performance of ivector/probabilistic linear discriminant analysis systems degrades. The proposed method transforms short-utterance feature vectors to adequate vectors using a deep neural network, which compensate for short utterances. To reduce the dimensionality of the search space, we extract several principal components from the residual vectors between every long utterance i-vector in a development set and its truncated short utterance i-vector. Then an input i-vector of the network is transformed by linear combination of these directions. In this case, network outputs correspond to weights for linear combination of principal components. We use public speech databases to evaluate the method. The experimental results on short2-10sec condition (det6, male portion) of the NIST 2008 speaker recognition evaluation corpus show that the proposed method reduces the minimum detection cost relative to the baseline system, which uses linear discriminant analysis transformed i-vectors as features.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130029033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A constrained adaptive scan order approach to transform coefficient entropy coding","authors":"Ching-Han Chiang, Jingning Han, Yaowu Xu","doi":"10.1109/ICASSP.2017.7952366","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952366","url":null,"abstract":"Transform coefficient coding is a key module in modern video compression systems. Typically, a block of the quantized coefficients are processed in a pre-defined zig-zag order, starting from DC and sweeping through low frequency positions to high frequency ones. Correlation between magnitudes of adjacent coefficients is exploited via context based probability models to improve compression efficiency. Such scheme is premised on the assumption that spatial transforms compact energy towards lower frequency coefficients, and the scan pattern that follows a descending order of the likelihood of coefficients being non-zero provides more accurate probability modeling. However, a pre-defined zig-zag pattern that is agnostic to signal statistics may not be optimal. This work proposes an adaptive approach to generate scan pattern dynamically. Unlike prior attempts that directly sort a 2-D array of coefficient positions according to the appearance frequency of non-zero levels only, the proposed scheme employs a topological sort that also fully accounts for the spatial constraints due to the context dependency in entropy coding. A streamlined framework is designed for processing both intra and inter prediction residuals. This generic approach is experimentally shown to provide consistent coding performance gains across a wide range of test settings.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124003890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}