{"title":"Kernelized Rényi distance for speaker recognition","authors":"Balaji Vasan Srinivasan, R. Duraiswami, D. Zotkin","doi":"10.1109/ICASSP.2010.5495587","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495587","url":null,"abstract":"Speaker recognition systems classify a test signal as a speaker or an imposter by evaluating a matching score between input and reference signals. We propose a new information theoretic approach for computation of the matching score using the Rényi entropy. The proposed entropic distance, the Kernelized Rényi distance (KRD), is formulated in a non-parametric way and the resulting measure is efficiently evaluated in a parallelized fashion on a graphical processor. The distance is then adapted as a scoring function and its performance compared with other popular scoring approaches in a speaker identification and speaker verification framework.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127668241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of speaking style and speaking rate on formant contours","authors":"Akiko Amano-Kusumoto, John-Paul Hosom","doi":"10.1109/ICASSP.2010.5495698","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495698","url":null,"abstract":"This paper presents the results of formant analysis using a newly developed formant contour model. We model formant contours with a linear combination of formant target values and coarticulation functions for /wVl/ and /tVl/ words. While formant target values are estimated globally over different speaking styles, coarticulation coefficients are estimated for individual tokens. The results show that the estimated coarticulation coefficients are inherently different between clear (CLR) and conversational (CNV) speech and that the movement of articulators when producing CLR speech is faster than when producing CNV speech. On the other hand, speaking rate is not a key determinant in movement of articulators at vowel onsets. The direct measure of F2 slope is strongly correlated with estimated coarticulation coefficients, which may lead to less parameters to estimate.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image-quality prediction of synthetic aperture sonar imagery","authors":"David P. Williams","doi":"10.1109/ICASSP.2010.5495165","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495165","url":null,"abstract":"This work exploits several machine-learning techniques to address the problem of image-quality prediction of synthetic aperture sonar (SAS) imagery. The objective is to predict the correlation of sonar ping-returns as a function of range from the sonar by using measurements of sonar-platform motion and estimates of environmental characteristics. The environmental characteristics are estimated by effectively performing unsupervised seabed segmentation, which entails extracting wavelet-based features, performing spectral clustering, and learning a variational Bayesian Gaussian mixture model. The motion measurements and environmental features are then used to learn a Gaussian process regression model so that ping correlations can be predicted. To handle issues related to the large size of the data set considered, sparse methods and an out-of-sample extension for spectral clustering are also exploited. The approach is demonstrated on an enormous data set of real SAS images collected in the Baltic Sea.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126345663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Goel, Samuel Thomas, Mohit Agarwal, Pinar Akyazi, L. Burget, Kai Feng, Arnab Ghoshal, O. Glembek, M. Karafiát, Daniel Povey, A. Rastrow, R. Rose, Petr Schwarz
{"title":"Approaches to automatic lexicon learning with limited training examples","authors":"N. Goel, Samuel Thomas, Mohit Agarwal, Pinar Akyazi, L. Burget, Kai Feng, Arnab Ghoshal, O. Glembek, M. Karafiát, Daniel Povey, A. Rastrow, R. Rose, Petr Schwarz","doi":"10.1109/ICASSP.2010.5495037","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495037","url":null,"abstract":"Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronounced. In this paper, we use a combination of lexicon learning techniques to explore whether a lexicon can be learned when only a small lexicon is available for boot-strapping. We discover that for a phonetic language such as Spanish, it is possible to do that better than what is possible from generic rules or hand-crafted pronunciations. For a more complex language such as English, we find that it is still possible but with some loss of accuracy.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126381120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compressive sensing and differential image-motion estimation","authors":"Nathan Jacobs, S. Schuh, Robert Pless","doi":"10.1109/ICASSP.2010.5495053","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495053","url":null,"abstract":"Compressive-sensing cameras are an important new class of sensors that have different design constraints than standard cameras. Surprisingly, little work has explored the relationship between compressive-sensing measurements and differential image motion. We show that, given modest constraints on the measurements and image motions, we can omit the computationally expensive compressive-sensing reconstruction step and obtain more accurate motion estimates with significantly less computation time. We also formulate a compressive-sensing reconstruction problem that incorporates known image motion and show that this method outperforms the state-of-the-art in compressive-sensing video reconstruction.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126520057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel approach to detecting non-native speakers and their native language","authors":"M. Omar, Jason W. Pelecanos","doi":"10.1109/ICASSP.2010.5495628","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495628","url":null,"abstract":"Speech contains valuable information regarding the traits of speakers. This paper investigates two aspects of this information. The first is automatic detection of non-native speakers and their native language on relatively large data sets. We present several experiments which show how our system outperforms the best published results on both the Fisher database and the foreign-accented English (FAE) database for detecting non-native speakers and their native language respectively. Such performance is achieved by using an SVM-based classifier with ASR-based features integrated with a novel universal background model (UBM) obtained by clustering the Gaussian components of an ASR acoustic model. The second aspect of this work is to utilize the detected speaker characteristics within a speaker recognition system to improve its performance.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128082559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subjective ratings of instantaneous and gradual transitions from narrowband to wideband active speech","authors":"S. Voran","doi":"10.1109/ICASSP.2010.5495187","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495187","url":null,"abstract":"In advanced heterogeneous telecommunication networks, network resources can dynamically dictate the type of speech coding that is used. An increase in resources allows for lower coding distortion or it might also be used to provide wideband speech instead of narrowband speech. Existing studies have demonstrated that wideband speech is preferred to narrowband speech, but they have also demonstrated that an abrupt transition from narrowband to wideband is perceived as an impairment, even though it is a transition to a higher quality signal. We describe our recent work that resulted in subjective scores for abrupt and gradual transitions from narrowband to wideband at the midpoint of a six-second segment of active speech. On average, signals that start narrowband and end wideband are rated slightly lower than constant narrowband signals and results are nearly the same for abrupt and gradual (2.5 second) transitions. Scores from 20 listeners show a wide range of individual opinions so we conclude that studies of bandwidth transitions may be quite sensitive to the listener population sample.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128118891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Steady-state analysis of the set-membership affine projection algorithm","authors":"Markus V. S. Lima, P. Diniz","doi":"10.1109/ICASSP.2010.5495836","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495836","url":null,"abstract":"Among the adaptive filtering algorithms the set-membership affine projection (SM-AP) algorithm has the attractive feature of not trading off misadjustment with convergence speed. This paper presents an analysis of the steady-state mean-square error (MSE) of the SM-AP algorithm. Our analysis relies on the energy conservation method and does not assume a specific probability distribution for the input vector. Moreover, since the SM-AP algorithm with a fixed-modulus error-based constraint vector generalizes some important algorithms, such as the SM normalized least-mean-square (SM-NLMS) algorithm, the results can be directly applied to these algorithms. Simulation results confirm the accuracy of our analysis.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128154619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Best-effort cooperative communication without dedicated relays","authors":"Nate Goergen, K. Liu, T. Clancy","doi":"10.1109/ICASSP.2010.5496050","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496050","url":null,"abstract":"Traditional decode-and-forward cooperation systems consider dedicated relays, while instead we consider wireless transceivers that cooperatively relay signals in addition to primary communication missions. A system that transmits the additional signal using a best-effort transmission policy within the original transmission energy constraint is considered. To maintain the original energy budget we consider the feasibility of reallocating energy from pilot signals toward the relaying service when channel conditions are stationary. Under the best-effort delivery policy, the node is not obligated to devote energy for relaying signals, nor does it provide a guarantee of signal quality to retransmissions. Instead the relay sacrifices energy at its own discretion, prioritizing the primary communication mission. Using the best-effort delivery policy, we derive an optimal power allocation rule that maintains a fixed symbol error rate for the relay's primary transmission, and further demonstrate cooperative communication gains using the proposed delivery method.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125249886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kronecker product matrices for compressive sensing","authors":"Marco F. Duarte, Richard Baraniuk","doi":"10.1109/ICASSP.2010.5495900","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495900","url":null,"abstract":"Compressive sensing (CS) is an emerging approach for acquisition of signals having a sparse or compressible representation in some basis. While CS literature has mostly focused on problems involving 1-D and 2-D signals, many important applications involve signals that are multidimensional. We propose the use of Kronecker product matrices in CS for two purposes. First, we can use such matrices as sparsifying bases that jointly model the different types of structure present in the signal. Second, the measurement matrices used in distributed measurement settings can be easily expressed as Kronecker products. This new formulation enables the derivation of analytical bounds for sparse approximation and CS recovery of multidimensional signals.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125477268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}