2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).最新文献

筛选
英文 中文
Environmental sniffing: noise knowledge estimation for robust speech systems 环境嗅探:鲁棒语音系统的噪声知识估计
Murat Akbacak, J. Hansen
{"title":"Environmental sniffing: noise knowledge estimation for robust speech systems","authors":"Murat Akbacak, J. Hansen","doi":"10.1109/ICASSP.2003.1202307","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202307","url":null,"abstract":"We propose a framework for extracting knowledge about environmental noise from an input audio sequence and organizing this knowledge for use by other speech systems. To date, most approaches dealing with environmental noise in speech systems are based on assumptions about the noise, or differences in the collection of and training on a specific noise condition, rather than exploring the nature of the noise. We are interested in constructing a new speech framework, entitled environmental sniffing, to detect, classify and track acoustic environmental conditions. The first goal of the framework is to seek out detailed information about the environmental characteristics instead of just detecting environmental changes. The second goal is to organize this knowledge in an effective manner to allow smart decisions to direct other speech systems. Our current framework uses a number of speech processing modules including the Teager energy operator (TEO) and a hybrid algorithm with T/sup 2/-BIC segmentation, noise language modeling and GMM classification in noise knowledge estimation. We define a new information criterion that incorporates the impact of noise on environmental sniffing performance. We use an in-vehicle speech and noise environment as a test platform for our evaluations and investigate the integration of environmental sniffing into an automatic speech recognition (ASR) engine in this environment. Noise classification experiments show that the hybrid algorithm achieves an error rate of 25.51%, outperforming a baseline system by an absolute 7.08%.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128863942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Wideband array signal processing using MCMC methods 宽带阵列信号处理的MCMC方法
W. Ng, J. Reilly, T. Kirubarajan, Jean-René Larocque
{"title":"Wideband array signal processing using MCMC methods","authors":"W. Ng, J. Reilly, T. Kirubarajan, Jean-René Larocque","doi":"10.1109/ICASSP.2003.1199900","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199900","url":null,"abstract":"This paper proposes a novel wideband structure for array signal processing. The method lends itself well to a Bayesian approach for jointly estimating the model order (number of sources) and the DOA through a reversible jump Markov chain Monte Carlo (MCMC) procedure. The source amplitudes are estimated through a maximum a posteriori (MAP) procedure. Advantages of the proposed method include joint detection of model order and estimation of the DOA parameters, and the fact that meaningful results can be obtained using fewer observations than previous methods. The DOA estimation performance of the proposed method is compared with the theoretical Cramer-Rao lower bound (CRLB) for this problem. Simulation results demonstrate the effectiveness and robustness of the method.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114602721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Speech enhancement based on the general transfer function GSC and postfiltering 基于通用传递函数GSC和后滤波的语音增强
S. Gannot, I. Cohen
{"title":"Speech enhancement based on the general transfer function GSC and postfiltering","authors":"S. Gannot, I. Cohen","doi":"10.1109/ICASSP.2003.1198929","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198929","url":null,"abstract":"In speech enhancement applications, microphone array postfiltering allows additional reduction of noise components at a beamformer output. Among microphone array structures, the recently proposed general transfer function generalized sidelobe canceller (TF-GSC) has shown impressive noise reduction abilities in a directional noise field, while still maintaining low speech distortion. However, in a diffused noise field, less significant noise reduction is obtainable. The performance is even further degraded when the noise is nonstationary. We present three postfiltering methods for improving the performance of microphone arrays. Two of them are based on single-channel speech enhancers and make use of recently proposed algorithms concatenated to the beamformer output. The third is a multichannel speech enhancer which exploits noise-only components constructed within the TF-GSC structure. An experimental study, which consists of both objective and subjective evaluation in various noise fields, demonstrates the advantage of the multi-channel postfiltering compared to single-channel techniques.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121590161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Group delay approximation of allpass digital filters by transforming the desired response 通过变换期望响应的全通数字滤波器群延迟逼近
T. Matsunaga, M. Ikehara
{"title":"Group delay approximation of allpass digital filters by transforming the desired response","authors":"T. Matsunaga, M. Ikehara","doi":"10.1109/ICASSP.2003.1201701","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1201701","url":null,"abstract":"In this paper, we present a new design method of allpass digital filters with equiripple group delay response. This method is based on solving a least squares solution iteratively. At each iteration, the desired group delay response is transformed so as to have equiripple error. By this method, an equiripple solution is obtained very quickly with less computational complexity.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115997848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mixtures of inverse covariances 逆协方差的混合
Vincent Vanhoucke, Ananth Sankar
{"title":"Mixtures of inverse covariances","authors":"Vincent Vanhoucke, Ananth Sankar","doi":"10.1109/ICASSP.2003.1198915","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198915","url":null,"abstract":"We introduce a model that approximates full and block-diagonal covariances in a Gaussian mixture, while reducing significantly both the number of parameters to estimate and the computations required to evaluate the Gaussian likelihoods. The inverse covariance of each Gaussian is expressed as a mixture of a small set of prototype matrices. Estimation of both the mixture weights and the prototypes is performed using maximum likelihood estimation. Experiments on a variety of speech recognition tasks show that this model significantly outperforms a diagonal covariance model, while using the same number of Gaussian-dependent parameters.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132153607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Robust variational speech separation using fewer microphones than speakers 鲁棒变分语音分离使用较少的麦克风比扬声器
Steven J. Rennie, P. Aarabi, T. Kristjansson, B. Frey, Kannan Achan
{"title":"Robust variational speech separation using fewer microphones than speakers","authors":"Steven J. Rennie, P. Aarabi, T. Kristjansson, B. Frey, Kannan Achan","doi":"10.1109/ICASSP.2003.1198723","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1198723","url":null,"abstract":"A variational inference algorithm for robust speech separation, capable of recovering the underlying speech sources even in the case of more sources than microphone observations, is presented. The algorithm is based upon a generative probabilistic model that fuses time-delay of arrival (TDOA) information with prior information about the speakers and application, to produce an optimal estimate of the underlying speech sources. Simulation results are presented for the case of two, three and four underlying sources and two microphone observations corrupted by noise. The resulting SNR gains (32 dB with two sources, 23 dB with three sources, and 16 dB with four sources) are significantly higher than previous speech separation techniques.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122001154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A trainable retrieval system for cartoon character images 一种可训练的卡通人物图像检索系统
M. Haseyama, Atsushi Matsumura
{"title":"A trainable retrieval system for cartoon character images","authors":"M. Haseyama, Atsushi Matsumura","doi":"10.1109/ICASSP.2003.1199564","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199564","url":null,"abstract":"This paper proposes a novel method to retrieve cartoon character images in a database or network. In this method, partial features of an image, defined as regions and aspects, are used as keys to identify cartoon character images. The similarities between a query cartoon character image and the images in the database are computed by using these features. Based on the similarities, the cartoon images same or similar to the query image are identified and retrieved from the database. Moreover, our method adopts a training scheme to reflect the user's subjectivity. The training emphasizes the significant regions or aspects by assigning more weight based on the user's preferences and actions, such as selecting a desired image or an area of an image. These processes make the retrieval more effective and accurate. Experimental results verify the effectiveness and retrieval accuracy of the method.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128363989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Robust cephalometric landmark identification using support vector machines 基于支持向量机的鲁棒头颅测量地标识别
S. Chakrabartty, M. Yagi, T. Shibata, G. Cauwenberghs
{"title":"Robust cephalometric landmark identification using support vector machines","authors":"S. Chakrabartty, M. Yagi, T. Shibata, G. Cauwenberghs","doi":"10.1109/ICASSP.2003.1202494","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1202494","url":null,"abstract":"A robust and accurate image recognizer for cephalometric landmarking is presented. The recognizer uses Gini support vector machine (SVM) to model discrimination boundaries between different landmarks and also between the background frames. Large margin classification with non-linear kernels allows to extract relevant details from the landmarks, approaching human expert levels of recognition. In conjunction with projected principal-edge distribution (PPED) representation as feature vectors, GiniSVM is able to demonstrate more than 95% accuracy for landmark detection on medical cephalograms within a reasonable location tolerance value.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130752142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification 比较MFCC和MPEG-7音频特征提取、最大似然HMM和熵先验HMM对运动音频分类的影响
Ziyou Xiong, R. Radhakrishnan, Ajay Divakaran, Thomas S. Huang
{"title":"Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification","authors":"Ziyou Xiong, R. Radhakrishnan, Ajay Divakaran, Thomas S. Huang","doi":"10.1109/ICASSP.2003.1200048","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1200048","url":null,"abstract":"We present a comparison of 6 methods for classification of sports audio. For feature extraction, we have two choices: MPEG-7 audio features and Mel-scale frequency cepstrum coefficients (MFCC). For classification, we also have two choices: maximum likelihood hidden Markov models (ML-HMM) and entropic prior HMMs (EP-HMM). EP-HMMs, in turn, have two variations: with and without trimming of the model parameters. We thus have 6 possible methods, each of which corresponds to a combination. Our results show that all the combinations achieve classification accuracy of around 90% with the best and the second best being, respectively, MPEG-7 features with EP-HMM and MFCC with ML-HMM.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127617846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Real-time adaptive background segmentation 实时自适应背景分割
D. Butler, S. Sridharan, V. Bove
{"title":"Real-time adaptive background segmentation","authors":"D. Butler, S. Sridharan, V. Bove","doi":"10.1109/ICASSP.2003.1199481","DOIUrl":"https://doi.org/10.1109/ICASSP.2003.1199481","url":null,"abstract":"Automatic analysis of digital video scenes often requires the segmentation of moving objects from the background. Historically, algorithms developed for this purpose have been restricted to small frame sizes, low frame rates or offline processing. The simplest approach involves subtracting the current frame from the known background. However, as the background is unknown, the key is how to learn and model it. The paper proposes a new algorithm that represents each pixel in the frame by a group of clusters. The clusters are ordered according the likelihood that they model the background and are adapted to deal with background and lighting variations. Incoming pixels are matched against the corresponding cluster group and are classified according to whether the matching cluster is considered part of the background. The algorithm has been subjectively evaluated against three other techniques. It demonstrates equal or better segmentation than the other techniques and proves capable of processing 320/spl times/240 video at 28 fps, excluding post-processing.","PeriodicalId":104473,"journal":{"name":"2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).","volume":"70 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116252278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信