{"title":"Environmental sniffing: noise knowledge estimation for robust speech systems","authors":"Murat Akbacak, J. Hansen","doi":"10.1109/ICASSP.2003.1202307","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202307","url":null,"abstract":"We propose a framework for extracting knowledge about environmental noise from an input audio sequence and organizing this knowledge for use by other speech systems. To date, most approaches dealing with environmental noise in speech systems are based on assumptions about the noise, or differences in the collection of and training on a specific noise condition, rather than exploring the nature of the noise. We are interested in constructing a new speech framework, entitled environmental sniffing, to detect, classify and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental changes. The second goal is to organize this knowledge in an effective manner to allow smart decisions to direct other speech systems. Our current framework uses a number of speech processing modules including the Teager energy operator (TEO) and a hybrid algorithm with T/sup 2/-BIC segmentation, noise language modeling and GMM classification in noise knowledge estimation. We define a new information criterion that incorporates the impact of noise on environmental sniffing performance. We use an in-vehicle speech and noise environment as a test platform for our evaluations and investigate the integration of environmental sniffing into an automatic speech recognition (ASR) engine in this environment. Noise classification experiments show that the hybrid algorithm achieves an error rate of 25.51%, outperforming a baseline system by an absolute 7.08%.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128863942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Ng, J. Reilly, T. Kirubarajan, Jean-René Larocque
{"title":"Wideband array signal processing using MCMC methods","authors":"W. Ng, J. Reilly, T. Kirubarajan, Jean-René Larocque","doi":"10.1109/ICASSP.2003.1199900","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199900","url":null,"abstract":"This paper proposes a novel wideband structure for array signal processing. The method lends itself well to a Bayesian approach for jointly estimating the model order (number of sources) and the DOA through a reversible jump Markov chain Monte Carlo (MCMC) procedure. The source amplitudes are estimated through a maximum a posteriori (MAP) procedure. Advantages of the proposed method include joint detection of model order and estimation of the DOA parameters, and the fact that meaningful results can be obtained using fewer observations than previous methods. The DOA estimation performance of the proposed method is compared with the theoretical Cramer-Rao lower bound (CRLB) for this problem. Simulation results demonstrate the effectiveness and robustness of the method.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114602721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech enhancement based on the general transfer function GSC and postfiltering","authors":"S. Gannot, I. Cohen","doi":"10.1109/ICASSP.2003.1198929","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198929","url":null,"abstract":"In speech enhancement applications, microphone array postfiltering allows additional reduction of noise components at a beamformer output. Among microphone array structures, the recently proposed general transfer function generalized sidelobe canceller (TF-GSC) has shown impressive noise reduction abilities in a directional noise field, while still maintaining low speech distortion. However, in a diffused noise field, less significant noise reduction is obtainable. The performance is even further degraded when the noise is nonstationary. We present three postfiltering methods for improving the performance of microphone arrays. Two of them are based on single-channel speech enhancers and make use of recently proposed algorithms concatenated to the beamformer output. The third is a multichannel speech enhancer which exploits noise-only components constructed within the TF-GSC structure. An experimental study, which consists of both objective and subjective evaluation in various noise fields, demonstrates the advantage of the multi-channel postfiltering compared to single-channel techniques.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121590161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group delay approximation of allpass digital filters by transforming the desired response","authors":"T. Matsunaga, M. Ikehara","doi":"10.1109/ICASSP.2003.1201701","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201701","url":null,"abstract":"In this paper, we present a new design method of allpass digital filters with equiripple group delay response. This method is based on solving a least squares solution iteratively. At each iteration, the desired group delay response is transformed so as to have equiripple error. By this method, an equiripple solution is obtained very quickly with less computational complexity.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115997848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mixtures of inverse covariances","authors":"Vincent Vanhoucke, Ananth Sankar","doi":"10.1109/ICASSP.2003.1198915","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198915","url":null,"abstract":"We introduce a model that approximates full and block-diagonal covariances in a Gaussian mixture, while reducing significantly both the number of parameters to estimate and the computations required to evaluate the Gaussian likelihoods. The inverse covariance of each Gaussian is expressed as a mixture of a small set of prototype matrices. Estimation of both the mixture weights and the prototypes is performed using maximum likelihood estimation. Experiments on a variety of speech recognition tasks show that this model significantly outperforms a diagonal covariance model, while using the same number of Gaussian-dependent parameters.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132153607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven J. Rennie, P. Aarabi, T. Kristjansson, B. Frey, Kannan Achan
{"title":"Robust variational speech separation using fewer microphones than speakers","authors":"Steven J. Rennie, P. Aarabi, T. Kristjansson, B. Frey, Kannan Achan","doi":"10.1109/ICASSP.2003.1198723","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198723","url":null,"abstract":"A variational inference algorithm for robust speech separation, capable of recovering the underlying speech sources even in the case of more sources than microphone observations, is presented. The algorithm is based upon a generative probabilistic model that fuses time-delay of arrival (TDOA) information with prior information about the speakers and application, to produce an optimal estimate of the underlying speech sources. Simulation results are presented for the case of two, three and four underlying sources and two microphone observations corrupted by noise. The resulting SNR gains (32 dB with two sources, 23 dB with three sources, and 16 dB with four sources) are significantly higher than previous speech separation techniques.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122001154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A trainable retrieval system for cartoon character images","authors":"M. Haseyama, Atsushi Matsumura","doi":"10.1109/ICASSP.2003.1199564","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199564","url":null,"abstract":"This paper proposes a novel method to retrieve cartoon character images in a database or network. In this method, partial features of an image, defined as regions and aspects, are used as keys to identify cartoon character images. The similarities between a query cartoon character image and the images in the database are computed by using these features. Based on the similarities, the cartoon images same or similar to the query image are identified and retrieved from the database. Moreover, our method adopts a training scheme to reflect the user's subjectivity. The training emphasizes the significant regions or aspects by assigning more weight based on the user's preferences and actions, such as selecting a desired image or an area of an image. These processes make the retrieval more effective and accurate. Experimental results verify the effectiveness and retrieval accuracy of the method.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Chakrabartty, M. Yagi, T. Shibata, G. Cauwenberghs
{"title":"Robust cephalometric landmark identification using support vector machines","authors":"S. Chakrabartty, M. Yagi, T. Shibata, G. Cauwenberghs","doi":"10.1109/ICASSP.2003.1202494","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202494","url":null,"abstract":"A robust and accurate image recognizer for cephalometric landmarking is presented. The recognizer uses Gini support vector machine (SVM) to model discrimination boundaries between different landmarks and also between the background frames. Large margin classification with non-linear kernels allows to extract relevant details from the landmarks, approaching human expert levels of recognition. In conjunction with projected principal-edge distribution (PPED) representation as feature vectors, GiniSVM is able to demonstrate more than 95% accuracy for landmark detection on medical cephalograms within a reasonable location tolerance value.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyou Xiong, R. Radhakrishnan, Ajay Divakaran, Thomas S. Huang
{"title":"Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification","authors":"Ziyou Xiong, R. Radhakrishnan, Ajay Divakaran, Thomas S. Huang","doi":"10.1109/ICASSP.2003.1200048","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200048","url":null,"abstract":"We present a comparison of 6 methods for classification of sports audio. For feature extraction, we have two choices: MPEG-7 audio features and Mel-scale frequency cepstrum coefficients (MFCC). For classification, we also have two choices: maximum likelihood hidden Markov models (ML-HMM) and entropic prior HMMs (EP-HMM). EP-HMMs, in turn, have two variations: with and without trimming of the model parameters. We thus have 6 possible methods, each of which corresponds to a combination. Our results show that all the combinations achieve classification accuracy of around 90% with the best and the second best being, respectively, MPEG-7 features with EP-HMM and MFCC with ML-HMM.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127617846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time adaptive background segmentation","authors":"D. Butler, S. Sridharan, V. Bove","doi":"10.1109/ICASSP.2003.1199481","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199481","url":null,"abstract":"Automatic analysis of digital video scenes often requires the segmentation of moving objects from the background. Historically, algorithms developed for this purpose have been restricted to small frame sizes, low frame rates or offline processing. The simplest approach involves subtracting the current frame from the known background. However, as the background is unknown, the key is how to learn and model it. The paper proposes a new algorithm that represents each pixel in the frame by a group of clusters. The clusters are ordered according the likelihood that they model the background and are adapted to deal with background and lighting variations. Incoming pixels are matched against the corresponding cluster group and are classified according to whether the matching cluster is considered part of the background. The algorithm has been subjectively evaluated against three other techniques. It demonstrates equal or better segmentation than the other techniques and proves capable of processing 320/spl times/240 video at 28 fps, excluding post-processing.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"70 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116252278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}