Christian Hofmann, Michael Guenther, M. Buerger, Walter Kellermann
{"title":"Higher-order listening room compensation with additive compensation signals","authors":"Christian Hofmann, Michael Guenther, M. Buerger, Walter Kellermann","doi":"10.1109/ICASSP.2016.7471732","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471732","url":null,"abstract":"The performance of sound reproduction systems for spatial audio is impaired by time-variant, reverberant listening environments. To tackle this issue, the Loudspeaker-Enclosure-Microphone System (LEMS) between the loudspeakers and reference microphones in the listening environment can be identified adaptively to allow an LEMS-specific pre-processing of the loudspeaker signals. This contribution introduces a broadband implementation of a narrowband Listening Room Compensation (LRC) method with additive compensation signals, recently proposed by Talagala et al. [1], it extends the concept to higher-order compensation, and compares LRC to Listening Room Equalization (LRE) analytically. Evaluations in an image-source environment confirm the efficacy of higher-order LRC and its suitability as a complexity-reduced alternative to LRE.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125286708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning full-range affinity for diffusion-based saliency detection","authors":"Keren Fu, I. Gu, Jie Yang","doi":"10.1109/ICASSP.2016.7472012","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472012","url":null,"abstract":"In this paper we address the issue of enhancing salient object detection through diffusion-based techniques. For reliably diffusing the energy from labeled seeds, we propose a novel graph-based diffusion scheme called affinity learning-based diffusion (ALD), which is based on learning full-range affinity between two arbitrary graph nodes. The method differs from the previous existing work where implicit diffusion was formulated as a ranking problem on a graph. In the proposed method, the affinity learning is achieved in a unified graph-based semi-supervised manner, whose outcome is leveraged for global propagation. By properly selecting an affinity learning model, the proposed ALD outperforms the ranking-based diffusion in terms of accurately detecting salient objects and enhancing the correct salient objects under a range of background scenarios. By utilizing the ALD, we propose an enhanced saliency detector that outperforms 7 recent state-of-the-art saliency models on 3 benchmark datasets.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125311161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A speaker adaptation technique for Gaussian process regression based speech synthesis using feature space transform","authors":"Tomoki Koriyama, Syohei Oshio, Takao Kobayashi","doi":"10.1109/ICASSP.2016.7472751","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472751","url":null,"abstract":"In this paper, we propose a speaker adaptation technique for statistical parametric speech synthesis based on Gaussian process regression (GPR). Although it is reported that the GPR-based speech synthesis improves the naturalness of synthetic speech compared with the HMM-based speech synthesis, any speaker adaptation techniques for the GPR-based one have not been established. This is because GPR is a nonparametric model and hence it is impossible to directly apply linear transforms to model parameters. In the proposed technique, we introduce feature-space transform to achieve model adaptation in the framework of GPR-based speech synthesis. Experimental results of objective and subjective tests show that the proposed technique outperforms the conventional HMM-based speaker adaptation framework.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126731699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint offloading decision and resource allocation for mobile cloud with computing access point","authors":"Meng-Hsi Chen, Min Dong, B. Liang","doi":"10.1109/ICASSP.2016.7472331","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472331","url":null,"abstract":"We consider a mobile cloud computing system consisting of multiple users, one computing access point (CAP), and one remote cloud server. The CAP can either process the received tasks from mobile users or offload them to the cloud. We aim to jointly optimize the offloading decisions of all users and the CAP, together with communication and processing resource allocation, to minimize the overall cost of energy, computation, and the maximum delay among all users. It is shown that the problem can be formulated as a non-convex quadratically constrained quadratic program, which is NP-hard in general. We further propose an efficient solution to this problem by semidefinite relaxation and a novel randomization mapping method. Our simulation results show that the proposed algorithm gives nearly optimal performance with only a small number of randomization iterations.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126930979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information fusion based on kernel entropy component analysis in discriminative canonical correlation space with application to audio emotion recognition","authors":"Lei Gao, L. Qi, L. Guan","doi":"10.1109/ICASSP.2016.7472191","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472191","url":null,"abstract":"As an information fusion tool, Kernel Entropy Component Analysis (KECA) is realized by using descriptor of information entropy and optimized by entropy estimation. However, as an unsuper-vised method, it merely puts the information or features from different channels together without considering their intrinsic structures and relations. In this paper, we introduce an enhanced version of KECA for information fusion, KECA in Discriminative Canonical Correlation Space (DCCS). Not only the intrinsic structures and discriminative representations are considered, but also the natural representations of input data are revealed by entropy estimation, leading to improved recognition accuracy. The effectiveness of the proposed solution is evaluated through experiments on two audio emotion databases. Experimental results show that the proposed solution outperforms the existing methods based on similar principles.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115065673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sum secrecy rate maximization for full-duplex two-way relay networks","authors":"Qiang Li, Dong-Wan Han","doi":"10.1109/ICASSP.2016.7472356","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472356","url":null,"abstract":"Consider a full-duplex two-way relay network, where two legitimate nodes simultaneously transmit and receive confidential information through a full-duplex multiantenna relay, in the presence of an eavesdropper. To secure the communications, an artificial-noise (AN)-aided amplify-and-forward (AF) strategy is employed at the relay, with a goal of maximizing the sum secrecy rate of the two-way transmissions. This sum secrecy rate maximization (SSRM) problem is nonconvex by nature, but can be converted into the form of the difference-of-concave (DC) functions after the semidefinite relaxation (SDR). Thus, the classical DC programming naturally applies. We prove that the SDR is tight and give a specific way to recover a stationary solution of the SSRM problem from the relaxed DC problem. Moreover, to reduce the iteration complexity of DC, we proposed an inexact DC framework, which uses an approximate solution to iterate, rather than a globally optimal one. The convergence of the inexact DC to a stationary solution of the SSRM problem is also established.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115510225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information theoretic multivariate change detection for multisensory information processing in Internet of Things","authors":"Lev Faivishevsky","doi":"10.1109/ICASSP.2016.7472879","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472879","url":null,"abstract":"Internet of Things (IoT) is one of the main technological trends in the recent years. It allows machine-to-machine communication over the internet. Almost each device may transmit information from its sensors over the web to enable centralized insights derivation in an appropriate cloud architecture. In this paper we review analytical aspects of the sensory information processing. We emphasize the importance of multisensory approach, in which the joint distribution of all sensors values of a device is used to derive insights out of the stream of sensory data. We introduce a novel information theoretic multivariate change detection method based on k-nearest neighbor (kNN) estimation. The algorithm is designed and implemented to satisfy the requirements of IoT for fast online parallel multisensory information processing. We provide a numerical evidence of the validity of the proposed method on simulated and real world data.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khaled A. Al-Hujaili, T. Al-Naffouri, M. Moinuddin
{"title":"The steady-state of the (Normalized) LMS is schur convex","authors":"Khaled A. Al-Hujaili, T. Al-Naffouri, M. Moinuddin","doi":"10.1109/ICASSP.2016.7472609","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472609","url":null,"abstract":"In this work, we demonstrate how the theory of majorization and schur-convexity can be used to assess the impact of input-spread on the Mean Squares Error (MSE) performance of adaptive filters. First, we show that the concept of majorization can be utilized to measure the spread in input-regressors and subsequently order the input-regressors according to their spread. Second, we prove that the MSE of the Least Mean Squares Error (LMS) and Normalized LMS (NLMS) algorithms are schur-convex, that is, the MSE of the LMS and the NLMS algorithms preserve the majorization order of the inputs which provide an analytical justification to why and how much the MSE performance of the LMS and the NLMS algorithms deteriorate as the spread in input increases.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"234 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116464059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A KL divergence and DNN approach to cross-lingual TTS","authors":"Fenglong Xie, F. Soong, Haifeng Li","doi":"10.1109/ICASSP.2016.7472732","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472732","url":null,"abstract":"We propose a Kullback-Leibler divergence (KLD) and deep neural net (DNN) based approach to cross-lingual TTS (CL-TTS) training. A speaker independent DNN (SI-DNN) ASR is used to equalize the speaker difference between a source speaker in L1 and a reference speaker in L2. Two speaker dependent GMM-HMM parametric TTS systems are first trained in the respective languages. The senones sets of the two TTS are matched in the SI-DNN ASR in terms of their output posteriors distributions in KLD. The minimum KLD criterion is used to transform the senones in the source speaker's TTS (L1) to the corresponding \"closest\" senones in the target language (L2). The new CL-TTS thus trained has been shown to achieve high speaker similarity to the source speaker in L1 while high intelligibility and naturalness are preserved. For untranscribed source speaker's recordings, say, conversational speech, a frame mapping, instead of \"senone mapping\" is also proposed to achieve a high but slightly inferior CL-TTS.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122446424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Nowakowski, N. Bertin, R. Gribonval, J. Rosny, L. Daudet
{"title":"Membrane shape and boundary conditions estimation using eigenmode decomposition","authors":"T. Nowakowski, N. Bertin, R. Gribonval, J. Rosny, L. Daudet","doi":"10.1109/ICASSP.2016.7472295","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472295","url":null,"abstract":"This paper investigates the problem of estimating the shape or the boundary impedance of a vibrating membrane from acoustic measurements in a limited sub-domain of the membrane. In acoustics, polygonal room shapes are usually estimated through room impulse response measurements. Impedance values of materials are, in turn, often calculated from the measurement of the acoustic reflection coefficients at the boundaries. In this work, we develop an alternative frequency-domain method to estimate the shape of a convex membrane with generalized Robin boundary conditions, from the measurement of its eigenmodes on a small portion of its surface. Reciprocally, we show that the same model allows to estimate the membrane borders' impedances when its shape is known.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}