2010 IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

筛选
英文 中文
Interactive tone mapping for High Dynamic Range video 高动态范围视频的交互式色调映射
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-09-14 DOI: 10.1109/ICASSP.2010.5495318
Zhe Wang, J. Zhai, Zhang Tao, J. Llach
{"title":"Interactive tone mapping for High Dynamic Range video","authors":"Zhe Wang, J. Zhai, Zhang Tao, J. Llach","doi":"10.1109/ICASSP.2010.5495318","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495318","url":null,"abstract":"Despite considerable progress in HDR image tone mapping for the past decade, little work has been done for HDR video. For applications such as film post-production, the capability of local tone manipulation is highly regarded by the content creators. This paper presents an interactive tone mapping scheme for HDR video sequences. It provides a simple scribble/ stroke based interface for local tone manipulation and is capable of propagating user input information throughout a video sequence by using Gaussian mixture model (GMM) and edge preserving filtering. The experimental results demonstrated its effectiveness for HDR video tone mapping as well as its flexibility for users to easily and intuitively manipulate the appearance of the video while maintaining temporal consistency.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133984930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting interruptions in dyadic spoken interactions 预测二元口语互动中的中断
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI: 10.1109/ICASSP.2010.5494991
Chi-Chun Lee, Shrikanth S. Narayanan
{"title":"Predicting interruptions in dyadic spoken interactions","authors":"Chi-Chun Lee, Shrikanth S. Narayanan","doi":"10.1109/ICASSP.2010.5494991","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5494991","url":null,"abstract":"Interruptions occur frequently in spontaneous conversations, and they are often associated with changes in the flow of conversation. Predicting interruption is essential in the design of natural human-machine spoken dialog interface. The modeling can bring insights into the dynamics of human-human conversation. This work utilizes Hidden Condition Random Field (HCRF) to predict occurrences of interruption in dyadic spoken interactions by modeling both speakers' behaviors before a turn change takes place. Our prediction model, using both the foreground speaker's acoustic cues and the listener's gestural cues, achieves an F-measure of 0.54, accuracy of 70.68%, and unweighted accuracy of 66.05% on a multimodal database of dyadic interactions. The experimental results also show that listener's behaviors provides an indication of his/her intention of interruption.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130144266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Simple methods for improving speaker-similarity of HMM-based speech synthesis 提高基于hmm的语音合成中说话人相似度的简单方法
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI: 10.1109/ICASSP.2010.5495562
J. Yamagishi, Simon King
{"title":"Simple methods for improving speaker-similarity of HMM-based speech synthesis","authors":"J. Yamagishi, Simon King","doi":"10.1109/ICASSP.2010.5495562","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495562","url":null,"abstract":"In this paper we revisit some basic configuration choices of HMM-based speech synthesis, such as waveform sampling rate, auditory frequency warping scale and the logarithmic scaling of F0, with the aim of improving speaker similarity which is an acknowledged weakness of current HMM-based speech synthesisers. All of the techniques investigated are simple but, as we demonstrate using perceptual tests, can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling rates can offer enhanced feature extraction and improved speaker similarity for speech synthesis. In addition, a generalized logarithmic transform of F0 results in larger intra-utterance variance of F0 trajectories and hence more dynamic and natural-sounding prosody.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132770663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition 基于logmelspec域模型的鲁棒远距离语音识别去噪
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI: 10.1109/ICASSP.2010.5495671
A. Sehr, R. Maas, Walter Kellermann
{"title":"Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition","authors":"A. Sehr, R. Maas, Walter Kellermann","doi":"10.1109/ICASSP.2010.5495671","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495671","url":null,"abstract":"The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in [1] for melspectral features, is extended in this contribution to logarithmic melspectral (logmelspec) features. Based on a combined acoustic model consisting of a hidden Markov model network and a reverberation model, REMOS determines clean-speech and reverberation estimates during recognition by an inner optimization operation. A reformulation of this inner optimization problem for logmelspec features, allowing an efficient solution by nonlinear optimization algorithms, is derived in this paper so that an efficient implementation of REMOS for logmelspec features becomes possible. Connected digit recognition experiments show that the proposed REMOS implementation significantly outperforms reverberantly-trained HMMs in highly reverberant environments.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132862384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A hybrid physical and statistical dynamic articulatory framework incorporating analysis-by-synthesis for improved phone classification 一个混合物理和统计动态发音框架,结合综合分析改进电话分类
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI: 10.1109/ICASSP.2010.5495696
Ziad Al Bawab, B. Raj, R. Stern
{"title":"A hybrid physical and statistical dynamic articulatory framework incorporating analysis-by-synthesis for improved phone classification","authors":"Ziad Al Bawab, B. Raj, R. Stern","doi":"10.1109/ICASSP.2010.5495696","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495696","url":null,"abstract":"In this paper, we present a dynamic articulatory model for phone classification. The model integrates real articulatory information derived from ElectroMagnetic Articulograph (EMA) data into its inner states. It maps from the articulatory space to the acoustic one using an adapted vocal tract model for each speaker and a physiologically-motivated articulatory synthesis approach. We apply the analysis-by-synthesis paradigm in a statistical fashion. We first present a fast approach for deriving analysis-by-synthesis distortion features. Next, the distortion between the speech synthesized from the articulatory states and the incoming speech signal is used to compute the output observation probabilities of the Hidden Markov Model (HMM) used for classification. Experiments with the novel framework show improvements over baseline in phone classification accuracy.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133231744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Search error risk minimization in Viterbi beam search for speech recognition 语音识别中维特比波束搜索误差风险最小化
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI: 10.21437/Interspeech.2010-101
Takaaki Hori, Shinji Watanabe, Atsushi Nakamura
{"title":"Search error risk minimization in Viterbi beam search for speech recognition","authors":"Takaaki Hori, Shinji Watanabe, Atsushi Nakamura","doi":"10.21437/Interspeech.2010-101","DOIUrl":"https://doi.org/10.21437/Interspeech.2010-101","url":null,"abstract":"This paper proposes a method to optimize Viterbi beam search based on search error risk minimization in large vocabulary continuous speech recognition (LVCSR). Most speech recognizers employ beam search to speed up the decoding process, in which unpromising partial hypotheses are pruned during decoding. However, the pruning step involves the risk of missing the best complete hypothesis by discarding a partial hypothesis that might grow into the best. Missing the best hypothesis is called search error. Our purpose is to reduce search error by optimizing the pruning step. While conventional methods use heuristic criteria to prune each hypothesis based on its score, rank, and so on, our proposed method introduces a pruning function that makes a more precise decision using the rich features extracted from each hypothesis. The parameters of the function can be estimated efficiently to minimize the search error risk using recognition lattices at the training step. We implemented the new method in a WFST-based decoder and achieved a significant reduction of search errors in a 200K-word LVCSR task.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129897991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Convergence analysis of consensus-based distributed clustering 基于共识的分布式聚类收敛性分析
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-06-28 DOI: 10.1109/ICASSP.2010.5495344
P. Forero, A. Cano, G. Giannakis
{"title":"Convergence analysis of consensus-based distributed clustering","authors":"P. Forero, A. Cano, G. Giannakis","doi":"10.1109/ICASSP.2010.5495344","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495344","url":null,"abstract":"This paper deals with clustering of spatially distributed data using wireless sensor networks. A distributed low-complexity clustering algorithm is developed that requires one-hop communications among neighboring nodes only, without local data exchanges. The algorithm alternates iterations over the variables of a consensus-based version of the global clustering problem. Using stability theory for time-varying and time-invariant systems, the distributed clustering algorithm is shown to be bounded-input bounded-output stable with an output arbitrarily close to a fixed point of the algorithm. For distributed hard K-means clustering, convergence to a local minimum of the centralized problem is guaranteed. Numerical examples confirm the merits of the algorithm and its stability analysis.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134514508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Sparse variable noisy PCA using l0 penalty 稀疏变量噪声PCA使用10惩罚
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-03-14 DOI: 10.1109/ICASSP.2010.5495788
M. Ulfarsson, V. Solo
{"title":"Sparse variable noisy PCA using l0 penalty","authors":"M. Ulfarsson, V. Solo","doi":"10.1109/ICASSP.2010.5495788","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495788","url":null,"abstract":"Sparse principal component analysis combines the idea of sparsity with principal component analysis (PCA). There are two kinds of sparse PCA; sparse loading PCA (slPCA) which keeps all the variables but zeroes out some of their loadings; and sparse variable PCA (svPCA) which removes whole variables by simultaneously zeroing out all the loadings on some variables. In this paper we propose a model based svPCA method based on the l0 penalty. We compare the detection performance of the proposed method with other subset selection method using a simulated data set. Additionally, we apply the method on a real high dimensional functional magnetic resonance imaging (fMRI) data set.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114980397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A bounded trust region optimization for discriminative training of HMMS in speech recognition 语音识别中hmm识别训练的有界信任域优化
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-03-14 DOI: 10.1109/ICASSP.2010.5495111
Cong Liu, Yu Hu, Hui Jiang, Lirong Dai
{"title":"A bounded trust region optimization for discriminative training of HMMS in speech recognition","authors":"Cong Liu, Yu Hu, Hui Jiang, Lirong Dai","doi":"10.1109/ICASSP.2010.5495111","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495111","url":null,"abstract":"In this paper, we have proposed a new method to construct an auxiliary function for the discriminative training of HMMs in speech recognition. The new auxiliary function serves as a first-order approximation of the original objective function but more importantly it remains as a lower bound of the original objective function as well. Furthermore, the trust region (TR) method in [1] is applied to find the globally optimal point of the new auxiliary function. Due to its lower-bound property, the found optimal point is theoretically guaranteed to increase the original discriminative objective function. The proposed bounded trust region method has been investigated on two LVCSR tasks, namely WSJ-5k and Switchboard 60-hour subset tasks. Experimental results show that the bounded TR method yields much better convergence behavior than both the conventional EBW method and the original TR method.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115169155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Music dereverberation using harmonic structure source model and Wiener filter 利用谐波结构源模型和维纳滤波器实现音乐去噪
2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-03-14 DOI: 10.1109/ICASSP.2010.5496223
Naoki Yasuraoka, Takuya Yoshioka, T. Nakatani, Atsushi Nakamura, HIroshi G. Okuno
{"title":"Music dereverberation using harmonic structure source model and Wiener filter","authors":"Naoki Yasuraoka, Takuya Yoshioka, T. Nakatani, Atsushi Nakamura, HIroshi G. Okuno","doi":"10.1109/ICASSP.2010.5496223","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496223","url":null,"abstract":"This paper proposes a dereverberation method for musical audio signals. Existing dereverberation methods are designed for speech signals and are not necessarily effective for suppressing long and dense reverberation in musical audio signals because: 1) an all-pole model and a non-parametric model, which are used to represent source spectra, do not match musical tones, and 2) the conventional inverse-filter-based dereverberation is not effective for suppressing long and dense reverberation. To overcome the two problems, an appropriate dereverberation approach for musical audio signals is established. The first problem is resolved by using a harmonic Gaussian mixture model (GMM) to accurately model the harmonic structure of a source spectrum. The second problem is resolved by performing dereverberation with a Wiener filter based on both an estimated inverse filter and an estimated source spectrum model. Experimental results reveal the effectiveness of the proposed dereverberation method using these two solutions.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信