{"title":"Interactive tone mapping for High Dynamic Range video","authors":"Zhe Wang, J. Zhai, Zhang Tao, J. Llach","doi":"10.1109/ICASSP.2010.5495318","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495318","url":null,"abstract":"Despite considerable progress in HDR image tone mapping for the past decade, little work has been done for HDR video. For applications such as film post-production, the capability of local tone manipulation is highly regarded by the content creators. This paper presents an interactive tone mapping scheme for HDR video sequences. It provides a simple scribble/ stroke based interface for local tone manipulation and is capable of propagating user input information throughout a video sequence by using Gaussian mixture model (GMM) and edge preserving filtering. The experimental results demonstrated its effectiveness for HDR video tone mapping as well as its flexibility for users to easily and intuitively manipulate the appearance of the video while maintaining temporal consistency.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133984930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting interruptions in dyadic spoken interactions","authors":"Chi-Chun Lee, Shrikanth S. Narayanan","doi":"10.1109/ICASSP.2010.5494991","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5494991","url":null,"abstract":"Interruptions occur frequently in spontaneous conversations, and they are often associated with changes in the flow of conversation. Predicting interruption is essential in the design of natural human-machine spoken dialog interface. The modeling can bring insights into the dynamics of human-human conversation. This work utilizes Hidden Condition Random Field (HCRF) to predict occurrences of interruption in dyadic spoken interactions by modeling both speakers' behaviors before a turn change takes place. Our prediction model, using both the foreground speaker's acoustic cues and the listener's gestural cues, achieves an F-measure of 0.54, accuracy of 70.68%, and unweighted accuracy of 66.05% on a multimodal database of dyadic interactions. The experimental results also show that listener's behaviors provides an indication of his/her intention of interruption.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130144266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simple methods for improving speaker-similarity of HMM-based speech synthesis","authors":"J. Yamagishi, Simon King","doi":"10.1109/ICASSP.2010.5495562","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495562","url":null,"abstract":"In this paper we revisit some basic configuration choices of HMM-based speech synthesis, such as waveform sampling rate, auditory frequency warping scale and the logarithmic scaling of F0, with the aim of improving speaker similarity which is an acknowledged weakness of current HMM-based speech synthesisers. All of the techniques investigated are simple but, as we demonstrate using perceptual tests, can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling rates can offer enhanced feature extraction and improved speaker similarity for speech synthesis. In addition, a generalized logarithmic transform of F0 results in larger intra-utterance variance of F0 trajectories and hence more dynamic and natural-sounding prosody.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132770663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition","authors":"A. Sehr, R. Maas, Walter Kellermann","doi":"10.1109/ICASSP.2010.5495671","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495671","url":null,"abstract":"The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in [1] for melspectral features, is extended in this contribution to logarithmic melspectral (logmelspec) features. Based on a combined acoustic model consisting of a hidden Markov model network and a reverberation model, REMOS determines clean-speech and reverberation estimates during recognition by an inner optimization operation. A reformulation of this inner optimization problem for logmelspec features, allowing an efficient solution by nonlinear optimization algorithms, is derived in this paper so that an efficient implementation of REMOS for logmelspec features becomes possible. Connected digit recognition experiments show that the proposed REMOS implementation significantly outperforms reverberantly-trained HMMs in highly reverberant environments.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132862384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid physical and statistical dynamic articulatory framework incorporating analysis-by-synthesis for improved phone classification","authors":"Ziad Al Bawab, B. Raj, R. Stern","doi":"10.1109/ICASSP.2010.5495696","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495696","url":null,"abstract":"In this paper, we present a dynamic articulatory model for phone classification. The model integrates real articulatory information derived from ElectroMagnetic Articulograph (EMA) data into its inner states. It maps from the articulatory space to the acoustic one using an adapted vocal tract model for each speaker and a physiologically-motivated articulatory synthesis approach. We apply the analysis-by-synthesis paradigm in a statistical fashion. We first present a fast approach for deriving analysis-by-synthesis distortion features. Next, the distortion between the speech synthesized from the articulatory states and the incoming speech signal is used to compute the output observation probabilities of the Hidden Markov Model (HMM) used for classification. Experiments with the novel framework show improvements over baseline in phone classification accuracy.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133231744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Search error risk minimization in Viterbi beam search for speech recognition","authors":"Takaaki Hori, Shinji Watanabe, Atsushi Nakamura","doi":"10.21437/Interspeech.2010-101","DOIUrl":"https://doi.org/10.21437/Interspeech.2010-101","url":null,"abstract":"This paper proposes a method to optimize Viterbi beam search based on search error risk minimization in large vocabulary continuous speech recognition (LVCSR). Most speech recognizers employ beam search to speed up the decoding process, in which unpromising partial hypotheses are pruned during decoding. However, the pruning step involves the risk of missing the best complete hypothesis by discarding a partial hypothesis that might grow into the best. Missing the best hypothesis is called search error. Our purpose is to reduce search error by optimizing the pruning step. While conventional methods use heuristic criteria to prune each hypothesis based on its score, rank, and so on, our proposed method introduces a pruning function that makes a more precise decision using the rich features extracted from each hypothesis. The parameters of the function can be estimated efficiently to minimize the search error risk using recognition lattices at the training step. We implemented the new method in a WFST-based decoder and achieved a significant reduction of search errors in a 200K-word LVCSR task.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129897991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convergence analysis of consensus-based distributed clustering","authors":"P. Forero, A. Cano, G. Giannakis","doi":"10.1109/ICASSP.2010.5495344","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495344","url":null,"abstract":"This paper deals with clustering of spatially distributed data using wireless sensor networks. A distributed low-complexity clustering algorithm is developed that requires one-hop communications among neighboring nodes only, without local data exchanges. The algorithm alternates iterations over the variables of a consensus-based version of the global clustering problem. Using stability theory for time-varying and time-invariant systems, the distributed clustering algorithm is shown to be bounded-input bounded-output stable with an output arbitrarily close to a fixed point of the algorithm. For distributed hard K-means clustering, convergence to a local minimum of the centralized problem is guaranteed. Numerical examples confirm the merits of the algorithm and its stability analysis.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134514508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse variable noisy PCA using l0 penalty","authors":"M. Ulfarsson, V. Solo","doi":"10.1109/ICASSP.2010.5495788","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495788","url":null,"abstract":"Sparse principal component analysis combines the idea of sparsity with principal component analysis (PCA). There are two kinds of sparse PCA; sparse loading PCA (slPCA) which keeps all the variables but zeroes out some of their loadings; and sparse variable PCA (svPCA) which removes whole variables by simultaneously zeroing out all the loadings on some variables. In this paper we propose a model based svPCA method based on the l0 penalty. We compare the detection performance of the proposed method with other subset selection method using a simulated data set. Additionally, we apply the method on a real high dimensional functional magnetic resonance imaging (fMRI) data set.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114980397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A bounded trust region optimization for discriminative training of HMMS in speech recognition","authors":"Cong Liu, Yu Hu, Hui Jiang, Lirong Dai","doi":"10.1109/ICASSP.2010.5495111","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5495111","url":null,"abstract":"In this paper, we have proposed a new method to construct an auxiliary function for the discriminative training of HMMs in speech recognition. The new auxiliary function serves as a first-order approximation of the original objective function but more importantly it remains as a lower bound of the original objective function as well. Furthermore, the trust region (TR) method in [1] is applied to find the globally optimal point of the new auxiliary function. Due to its lower-bound property, the found optimal point is theoretically guaranteed to increase the original discriminative objective function. The proposed bounded trust region method has been investigated on two LVCSR tasks, namely WSJ-5k and Switchboard 60-hour subset tasks. Experimental results show that the bounded TR method yields much better convergence behavior than both the conventional EBW method and the original TR method.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115169155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naoki Yasuraoka, Takuya Yoshioka, T. Nakatani, Atsushi Nakamura, HIroshi G. Okuno
{"title":"Music dereverberation using harmonic structure source model and Wiener filter","authors":"Naoki Yasuraoka, Takuya Yoshioka, T. Nakatani, Atsushi Nakamura, HIroshi G. Okuno","doi":"10.1109/ICASSP.2010.5496223","DOIUrl":"https://doi.org/10.1109/ICASSP.2010.5496223","url":null,"abstract":"This paper proposes a dereverberation method for musical audio signals. Existing dereverberation methods are designed for speech signals and are not necessarily effective for suppressing long and dense reverberation in musical audio signals because: 1) an all-pole model and a non-parametric model, which are used to represent source spectra, do not match musical tones, and 2) the conventional inverse-filter-based dereverberation is not effective for suppressing long and dense reverberation. To overcome the two problems, an appropriate dereverberation approach for musical audio signals is established. The first problem is resolved by using a harmonic Gaussian mixture model (GMM) to accurately model the harmonic structure of a source spectrum. The second problem is resolved by performing dereverberation with a Wiener filter based on both an estimated inverse filter and an estimated source spectrum model. Experimental results reveal the effectiveness of the proposed dereverberation method using these two solutions.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}