2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Binaural speech segregation based on pitch and azimuth tracking 基于音高和方位跟踪的双耳语音分离
John F. Woodruff, Deliang Wang
{"title":"Binaural speech segregation based on pitch and azimuth tracking","authors":"John F. Woodruff, Deliang Wang","doi":"10.1109/ICASSP.2012.6287862","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287862","url":null,"abstract":"We propose an approach to binaural speech segregation in reverberation based on pitch and azimuth cues. These cues are integrated within a statistical tracking framework to estimate up to two concurrent pitch frequencies and three concurrent azimuth angles. The tracking framework implicitly estimates binary time-frequency masks by solving a data association problem, thereby performing speech segregation. Experimental results show that the proposed approach compares favorably to existing two-microphone systems in spite of less prior information. The benefit of the proposed approach is most pronounced in conditions with substantial reverberation or for closely spaced sources.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85732199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improved minimum converted trajectory error training for real-time speech-to-lips conversion 改进了用于实时语音到嘴唇转换的最小转换轨迹误差训练
Wei Han, Lijuan Wang, F. Soong, Bo Yuan
{"title":"Improved minimum converted trajectory error training for real-time speech-to-lips conversion","authors":"Wei Han, Lijuan Wang, F. Soong, Bo Yuan","doi":"10.1109/ICASSP.2012.6288921","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288921","url":null,"abstract":"Gaussian mixture model (GMM) based speech-to-lips conversion often operates in two alternative ways: batch conversion and sliding window-based conversion for real-time processing. Previously, Minimum Converted Trajectory Error (MCTE) training has been proposed to improve the performance of batch conversion. In this paper, we extend previous work and propose a new training criteria, MCTE for Real-time conversion (R-MCTE), to explicitly optimize the quality of sliding window-based conversion. In R-MCTE, we use the probabilistic descent method to refine model parameters by minimizing the error on real-time converted visual trajectories over training data. Objective evaluations on the LIPS 2008 Visual Speech Synthesis Challenge data set shows that the proposed method achieves both good lip animation performance and low delay in real-time conversion.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85859314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A novel eye region based privacy protection scheme 一种新的基于眼域的隐私保护方案
Dohyoung Lee, K. Plataniotis
{"title":"A novel eye region based privacy protection scheme","authors":"Dohyoung Lee, K. Plataniotis","doi":"10.1109/ICASSP.2012.6288261","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288261","url":null,"abstract":"This paper introduces a novel eye region scrambling scheme capable of protecting privacy sensitive eye region information present in video contents. The proposed system consists of an automatic eye detection module followed by a privacy enabling JPEG XR encoder module. An object detection method based on a probabilistic model of image generation is used in conjunction with a skin-tone segmentation to accurately locate eye regions in real time. The utilized JPEG XR encoder effectively deteriorate the visual quality of privacy sensitive eye region at low computational cost. Performance of proposed solution is validated using benchmark face recognition algorithms on face image database. Experimental results indicate that the proposed solution is able to conceal identity by preventing successful identification at low computational costs.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85941566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analysis of the sphericalwave truncation error for spherical harmonic soundfield expansions 球面谐波声场扩展的球波截断误差分析
S. Brown, Shuai Wang, D. Sen
{"title":"Analysis of the sphericalwave truncation error for spherical harmonic soundfield expansions","authors":"S. Brown, Shuai Wang, D. Sen","doi":"10.1109/ICASSP.2012.6287803","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287803","url":null,"abstract":"Three dimensional soundfield recording and reproduction is an area of ongoing investigation and its implementation is increasingly achieved through use of the infinite Spherical Harmonic soundfield expansion. Perfect recording or reconstruction requires infinite microphones or loudspeakers, respectively. Thus, real-world approximations to both require spatial discretisation, which truncates the soundfield expansion and loses some of the soundfield information. The resulting truncation error is the focus of this paper, specifically for soundfields comprising of spherical waves. We define two norms of the truncation error to signal ratio, L2 and L∞, for comparison and use in different situations. Finally we observe how some of these errors converge to the plane wave case under certain circumstances.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85985395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Inventory-style speech enhancement with uncertainty-of-observation techniques 基于观察不确定性技术的清单式语音增强
R. M. Nickel, Ramón Fernández Astudillo, D. Kolossa, Steffen Zeiler, Rainer Martin
{"title":"Inventory-style speech enhancement with uncertainty-of-observation techniques","authors":"R. M. Nickel, Ramón Fernández Astudillo, D. Kolossa, Steffen Zeiler, Rainer Martin","doi":"10.1109/ICASSP.2012.6288954","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288954","url":null,"abstract":"We present a new method for inventory-style speech enhancement that significantly improves over earlier approaches [1]. Inventory-style enhancement attempts to resynthesize a clean speech signal from a noisy signal via corpus-based speech synthesis. The advantage of such an approach is that one is not bound to trade noise suppression against signal distortion in the same way that most traditional methods do. A significant improvement in perceptual quality is typically the result. Disadvantages of this new approach, however, include speaker dependency, increased processing delays, and the necessity of substantial system training. Earlier published methods relied on a-priori knowledge of the expected noise type during the training process [1]. In this paper we present a new method that exploits uncertainty-of-observation techniques to circumvent the need for noise specific training. Experimental results show that the new method is not only able to match, but outperform the earlier approaches in perceptual quality.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76721211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Face recognition based on nonsubsampled contourlet transform and block-based kernel Fisher linear discriminant 基于非下采样contourlet变换和分块核Fisher线性判别的人脸识别
Biao Wang, Weifeng Li, Q. Liao
{"title":"Face recognition based on nonsubsampled contourlet transform and block-based kernel Fisher linear discriminant","authors":"Biao Wang, Weifeng Li, Q. Liao","doi":"10.1109/ICASSP.2012.6288183","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288183","url":null,"abstract":"Face representation, including both feature extraction and feature selection, is the key issue for a successful face recognition system. In this paper, we propose a novel face representation scheme based on nonsubsampled contourlet transform (NSCT) and block-based kernel Fisher linear discriminant (BKFLD). NSCT is a newly developed multiresolution analysis tool and has the ability to extract both intrinsic geometrical structure and directional information in images, which implies its discriminative potential for effective feature extraction of face images. By encoding the the NSCT coefficient images with the local binary pattern (LBP) operator, we could obtain a robust feature set. Furthermore, kernel Fisher linear discriminant is introduced to select the most discriminative feature sets, and the block-based scheme is incorporated to address the small sample size problem. Face recognition experiments on FERET database demonstrate the effectiveness of our proposed approach.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76886966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Handling incomplete matrix data via continuous-valued infinite relational model 用连续值无限关系模型处理不完全矩阵数据
Tomohiko Suzuki, Takuma Nakamura, Yasutoshi Ida, Takashi Matsumoto
{"title":"Handling incomplete matrix data via continuous-valued infinite relational model","authors":"Tomohiko Suzuki, Takuma Nakamura, Yasutoshi Ida, Takashi Matsumoto","doi":"10.1109/ICASSP.2012.6288338","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288338","url":null,"abstract":"A continuous-valued infinite relational model is proposed as a solution to the co-clustering problem which arises in matrix data or tensor data calculations. The model is a probabilistic model utilizing the framework of Bayesian Nonparametrics which can estimate the number of components in posterior distributions. The original Infinite Relational Model cannot handle continuous-valued or multi-dimensional data directly. Our proposed model overcomes the data expression restrictions by utilizing the proposed likelihood, which can handle many types of data. The posterior distribution is estimated via variational inference. Using real-world data, we show that the proposed model outperforms the original model in terms of AUC score and efficiency for a movie recommendation task. (111 words).","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80838909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training IVN声学模型训练中基于i向量的声学嗅探判别特征提取研究
Yu Zhang, Jian Xu, Zhijie Yan, Qiang Huo
{"title":"A study of discriminative feature extraction for i-vector based acoustic sniffing in IVN acoustic model training","authors":"Yu Zhang, Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ICASSP.2012.6288814","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288814","url":null,"abstract":"Recently, we proposed an i-vector approach to acoustic sniffing for irrelevant variability normalization based acoustic model training in large vocabulary continuous speech recognition (LVCSR). Its effectiveness has been confirmed by experimental results on Switchboard- 1 conversational telephone speech transcription task. In this paper, we study several discriminative feature extraction approaches in i-vector space to improve both recognition accuracy and run-time efficiency. New experimental results are reported on a much larger scale LVCSR task with about 2000 hours training data.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83603373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering 多通道语音去噪和分离与线性和非线性滤波的优化组合
M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga
{"title":"Multichannel speech dereverberation and separation with optimized combination of linear and non-linear filtering","authors":"M. Togami, Y. Kawaguchi, Ryu Takeda, Y. Obuchi, N. Nukaga","doi":"10.1109/ICASSP.2012.6288809","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288809","url":null,"abstract":"In this paper, we propose a multichannel speech dereverberation and separation technique which is effective even when there are multiple speakers and each speaker's transfer function is time-varying due to fluctuation of the corresponding speaker's head. For robustness against fluctuation, the proposed method optimizes linear filtering with non-linear filtering simultaneously from probabilistic perspective based on a probabilistic reverberant transfer-function model, PRTFM. PRTFM is an extension of the conventional time-invariant transfer-function model under uncertain conditions, and PRTFM can be also regarded as an extension of recently proposed blind local Gaussian modeling. The linear filtering and the non-linear filtering are optimized in MMSE (Minimum Mean Square Error) sense during parameter optimization. The proposed method is evaluated in a reverberant meeting room, and the proposed method is shown to be effective.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76287121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation 基于先验信噪比估计的语音增强算法的权衡评估
Pei Chee Yong, S. Nordholm, H. H. Dam
{"title":"Trade-off evaluation for speech enhancement algorithms with respect to the a priori SNR estimation","authors":"Pei Chee Yong, S. Nordholm, H. H. Dam","doi":"10.1109/ICASSP.2012.6288957","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288957","url":null,"abstract":"In this paper, a modified a priori SNR estimator is proposed for speech enhancement. The well-known decision-directed (DD) approach is modified by matching each gain function with the noisy speech spectrum at current frame rather than the previous one. The proposed algorithm eliminates the speech transient distortion and reduces the impact from the choice of the gain function towards the level of smoothing in the SNR estimate. An objective evaluation metric is employed to measure the trade-off between musical noise, noise reduction and speech distortion. Performance is evaluated and compared between a modified sigmoid gain function, the state-of-the-art log-spectral amplitude estimator and the Wiener filter. Simulation results show that the modified DD approach performs better in terms of the trade-off evaluation.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73598202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信