2012 8th International Symposium on Chinese Spoken Language Processing最新文献_第8页

A study on cepstral sub-band normalization for robust ASR 鲁棒ASR的倒谱子带归一化研究

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423484

Syu-Siang Wang, J. Hung, Yu Tsao

{"title":"A study on cepstral sub-band normalization for robust ASR","authors":"Syu-Siang Wang, J. Hung, Yu Tsao","doi":"10.1109/ISCSLP.2012.6423484","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423484","url":null,"abstract":"In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced frontend (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125167434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Statistical modification based post-filtering technique for HMM-based speech synthesis 基于统计修正的hmm语音合成后滤波技术

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423456

Zhengqi Wen, J. Tao, Hao Che

引用次数: 0

Speaker-ensemble hidden Markov modeling for automatic speech recognition 自动语音识别的扬声器集成隐马尔可夫建模

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423532

Guoli Ye, B. Mak

引用次数: 0

Controlling the tradeoff property in a regularization framework for noise reduction 在降噪的正则化框架中控制权衡特性

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423500

Xugang Lu, M. Unoki, Shigeki Matsuda, Chiori Hori, H. Kashioka

引用次数: 0

A cross-dialect comparison of vowel dispersion and vowel variability 跨方言元音分散和元音变异性的比较

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423458

Wai-Sum Lee

引用次数: 4

Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations 人工耳蜗模拟中，激励扩散对普通话语音可理解性的影响

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423502

Fei Chen, Tian Guan, L. Wong

引用次数: 1

Tones in whispered Mandarin 普通话耳语的语调

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423539

Bin Li, R. Rong

引用次数: 2

A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation 统一轨迹平铺方法实现高质量TTS和跨语言语音转换

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423506

Yao Qian, F. Soong

引用次数: 0

Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech 基于合成语音主观评价结果的改进单元选择语音合成方法

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423524

Xian-Jun Xia, Zhenhua Ling, Chen-Yu Yang, Lirong Dai

{"title":"Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech","authors":"Xian-Jun Xia, Zhenhua Ling, Chen-Yu Yang, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423524","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423524","url":null,"abstract":"This paper presents an improved unit selection and waveform concatenation speech synthesis method by gathering and utilizing human feedbacks on synthetic speech. Firstly, a set of texts are synthesized by the baseline unit selection synthesis system. Each prosodic word within the synthetic speech is then evaluated as a natural one or an unnatural one by listeners. In our proposed method, these natural synthetic segments are treated as virtual candidate units to extend the original speech corpus for unit selection. A new speech synthesis system is constructed using this extended speech corpus. A synthetic error detector based on SVM classifier is also built using the natural and unnatural synthetic speech. At synthesis time, the input text is synthesized using the baseline system and the extended system simultaneously. The two unit selection results are evaluated by the trained synthetic error detector to determine the optimal one. Experimental results prove the effectiveness of our proposed method in improving the naturalness of synthetic speech on a task of synthesizing place names.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Alternative hypothesis generation using a weighted kernel feature matrix for ASR substitution error correction 基于加权核特征矩阵的备选假设生成用于ASR替换误差校正

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423475

Chao-Hong Liu, Chung-Hsien Wu, David Sarwono

{"title":"Alternative hypothesis generation using a weighted kernel feature matrix for ASR substitution error correction","authors":"Chao-Hong Liu, Chung-Hsien Wu, David Sarwono","doi":"10.1109/ISCSLP.2012.6423475","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423475","url":null,"abstract":"Although automatic speech recognition (ASR) has been successfully used in several applications, it is still non-robust and imprecise especially in a harsh environment wherein the input speech is of low quality. Robust error correction for ASR outputs thus becomes important in addition to improving recognition performance. In recent approaches to error correction, linguistic or domain information is used to generate the alternative hypotheses for the ASR outputs followed by the selection of the most likely alternative. In this study, the distances between ASR outputs and the potentially correct alternatives are estimated based on a weighted context-dependent syllable cluster-based kernel feature matrix followed by multidimensional scaling (MDS)-based distance rescaling. These distances are then used to construct an alternative syllable lattice and the dynamic programming is used to obtain the most likely correct output with respect to the original ASR results. Experiments show that the proposed method achieved about 1.95% improvement on the word error rate compared to the correction pair approach using the MATBN Mandarin Chinese broadcast news corpus.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122580341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0