2012 8th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
A study on cepstral sub-band normalization for robust ASR 鲁棒ASR的倒谱子带归一化研究
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423484
Syu-Siang Wang, J. Hung, Yu Tsao
{"title":"A study on cepstral sub-band normalization for robust ASR","authors":"Syu-Siang Wang, J. Hung, Yu Tsao","doi":"10.1109/ISCSLP.2012.6423484","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423484","url":null,"abstract":"In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced frontend (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125167434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Statistical modification based post-filtering technique for HMM-based speech synthesis 基于统计修正的hmm语音合成后滤波技术
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423456
Zhengqi Wen, J. Tao, Hao Che
{"title":"Statistical modification based post-filtering technique for HMM-based speech synthesis","authors":"Zhengqi Wen, J. Tao, Hao Che","doi":"10.1109/ISCSLP.2012.6423456","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423456","url":null,"abstract":"The speech generated from hidden Markov model (HMM)-based speech synthesis systems (HTS) is suffered from over-smoothing problem which is due to statistical modeling. This paper will focus on post-filtering technique based on statistical modification for the generated speech parameters. The marginal statistics of parameters' trajectory, such as mean, variance, skewness and kurtosis are adjusted according to the values generated from the HTS system. This technique is compared with global variance (GV)-based speech generation algorithm. The listening test showed that the post-filtering technique considering the mean and variance could generate almost equal result with GV model. When further considering the modification of skewness and kurtosis, the quality of generated speech has been improved.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123733291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speaker-ensemble hidden Markov modeling for automatic speech recognition 自动语音识别的扬声器集成隐马尔可夫建模
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423532
Guoli Ye, B. Mak
{"title":"Speaker-ensemble hidden Markov modeling for automatic speech recognition","authors":"Guoli Ye, B. Mak","doi":"10.1109/ISCSLP.2012.6423532","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423532","url":null,"abstract":"This paper proposes a new hidden Makov model (HMM) which we call speaker-ensemble HMM (SE-HMM). An SE-HMM is a multi-path HMM in which each path is an HMM constructed from the training data of a different speaker. SE-HMM may be considered a form of template-based acoustic model where speaker-specific acoustic templates are compressed statistically into speaker-specific HMMs. However, one has the flexibility of building SE-HMM at various level of compression: SE-HMM may be built for a triphone state, a triphone, a whole utterance, or other convenient phonetic units. As a result, SE-HMM contains more details than conventional HMM, but is much smaller than common template-based acoustic models. Furthermore, the construction of SE-HMM is simple, and since it is still an HMM, its construction and computation is well supported by common HMM toolkits such as HTK. The proposed SE-HMM was evaluated on Resource Management and Wall Street Journal tasks, and it consistently gives better word recognition results than conventional HMM.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controlling the tradeoff property in a regularization framework for noise reduction 在降噪的正则化框架中控制权衡特性
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423500
Xugang Lu, M. Unoki, Shigeki Matsuda, Chiori Hori, H. Kashioka
{"title":"Controlling the tradeoff property in a regularization framework for noise reduction","authors":"Xugang Lu, M. Unoki, Shigeki Matsuda, Chiori Hori, H. Kashioka","doi":"10.1109/ISCSLP.2012.6423500","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423500","url":null,"abstract":"The tradeoff between noise reduction and speech distortion is a key concern in designing noise reduction algorithms. We have proposed a regularization framework for noise reduction with the consideration of the tradeoff problem. We regard speech estimation as a functional approximation problem in a reproducing kernel Hilbert space (RKHS). In the estimation, the objective function is formulated to find an approximation function that gives a good tradeoff between the approximation accuracy and complexity of the function. By using a regularization method, the approximation function can be estimated from noisy observations. In this paper, we further provided a theoretical analysis of the tradeoff property of the framework in noise reduction. We applied the framework for speech enhancement experiments in real applications. Compared with several classical noise reduction methods, the proposed framework showed promising advantages.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122643745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A cross-dialect comparison of vowel dispersion and vowel variability 跨方言元音分散和元音变异性的比较
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423458
Wai-Sum Lee
{"title":"A cross-dialect comparison of vowel dispersion and vowel variability","authors":"Wai-Sum Lee","doi":"10.1109/ISCSLP.2012.6423458","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423458","url":null,"abstract":"The study is a cross-dialect comparison of the vowel systems of different inventories across five Chinese dialects in terms of vowel dispersion and vowel variability. The dialects include Meixian Kejia or Hakka with 5 vowels, Hong Kong Cantonese with 7 vowels, Fuzhou with 8 vowels, Ningbo with 10 vowels, and Wenling with 11 vowels. Formant frequencies were obtained through spectral analysis of speech data from 10 male and 10 female speakers of each dialect. The findings of this study do not support the vowel dispersion theory which predicts that (i) the larger the vowel inventory is, the more expanded vowel space will be in the F1F2 plane, and (ii) variability in vowel formants is inversely related to vowel inventory size.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130145297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations 人工耳蜗模拟中,激励扩散对普通话语音可理解性的影响
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423502
Fei Chen, Tian Guan, L. Wong
{"title":"Effects of excitation spread on the intelligibility of Mandarin speech in cochlear implant simulations","authors":"Fei Chen, Tian Guan, L. Wong","doi":"10.1109/ISCSLP.2012.6423502","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423502","url":null,"abstract":"Noisy listening conditions remain challenging for most cochlear implant patients. The present study simulated the effects of decay rates of excitation spread in cochlear implants on the intelligibility of Mandarin speech in noise. Mandarin sentence and tone stimuli were processed by noise-vocoder, and presented to normal-hearing listeners for identification. The decay rates of excitation spread were simulated by varying the slopes of synthesis filters in noise-vocoder. Experimental results showed that significant benefit for Mandarin sentence recognition in noise was observed with narrower type of excitation. The performance of Mandarin tone identification was relatively robust to the influence of excitation spread. The results in the present study suggest that reducing the decay rates of excitation spread may potentially improve the speech perception in noise for cochlear implants in the future.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"173 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tones in whispered Mandarin 普通话耳语的语调
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423539
Bin Li, R. Rong
{"title":"Tones in whispered Mandarin","authors":"Bin Li, R. Rong","doi":"10.1109/ISCSLP.2012.6423539","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423539","url":null,"abstract":"This paper examines and compares the characteristics of tones in a CV syllable in Mandarin under phonated and whispered speech. Formants of the vowel in various contexts are also compared across the tone environments in different phonation types, in order to assess whether and how tone environments and vowel production interacts, as the paper is interested as well in whether lack of fundamental frequency in whisper is compensated by other phonetic means in a tonal language. Results suggest that temporal correlates are maintained to a certain extent, and that the vowel space is shifted significantly towards higher frequency range.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131216090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation 统一轨迹平铺方法实现高质量TTS和跨语言语音转换
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423506
Yao Qian, F. Soong
{"title":"A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation","authors":"Yao Qian, F. Soong","doi":"10.1109/ISCSLP.2012.6423506","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423506","url":null,"abstract":"In human-machine speech communication, it is technically challenging to make the machine talk as naturally as human so as to facilitate “frictionless” interactions, or make a human user to feel the communication is as natural as human-human. We propose a trajectory tiling approach to high quality speech synthesis, where the speech parameter trajectories, extracted from natural, processed, or synthesized speech, are used to guide the search for the best sequence of waveform segment “tiles” stored in a pre-recorded speech database. We test our approach in both TTS and cross-lingual voice transformation applications. Experimental results show that the proposed trajectory tiling approach can render speech which is both natural and highly intelligible. The perceived high quality speech is also confirmed in objective and subjective tests.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114422562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech 基于合成语音主观评价结果的改进单元选择语音合成方法
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423524
Xian-Jun Xia, Zhenhua Ling, Chen-Yu Yang, Lirong Dai
{"title":"Improved unit selection speech synthesis method utilizing subjective evaluation results on synthetic speech","authors":"Xian-Jun Xia, Zhenhua Ling, Chen-Yu Yang, Lirong Dai","doi":"10.1109/ISCSLP.2012.6423524","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423524","url":null,"abstract":"This paper presents an improved unit selection and waveform concatenation speech synthesis method by gathering and utilizing human feedbacks on synthetic speech. Firstly, a set of texts are synthesized by the baseline unit selection synthesis system. Each prosodic word within the synthetic speech is then evaluated as a natural one or an unnatural one by listeners. In our proposed method, these natural synthetic segments are treated as virtual candidate units to extend the original speech corpus for unit selection. A new speech synthesis system is constructed using this extended speech corpus. A synthetic error detector based on SVM classifier is also built using the natural and unnatural synthetic speech. At synthesis time, the input text is synthesized using the baseline system and the extended system simultaneously. The two unit selection results are evaluated by the trained synthetic error detector to determine the optimal one. Experimental results prove the effectiveness of our proposed method in improving the naturalness of synthetic speech on a task of synthesizing place names.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Alternative hypothesis generation using a weighted kernel feature matrix for ASR substitution error correction 基于加权核特征矩阵的备选假设生成用于ASR替换误差校正
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423475
Chao-Hong Liu, Chung-Hsien Wu, David Sarwono
{"title":"Alternative hypothesis generation using a weighted kernel feature matrix for ASR substitution error correction","authors":"Chao-Hong Liu, Chung-Hsien Wu, David Sarwono","doi":"10.1109/ISCSLP.2012.6423475","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423475","url":null,"abstract":"Although automatic speech recognition (ASR) has been successfully used in several applications, it is still non-robust and imprecise especially in a harsh environment wherein the input speech is of low quality. Robust error correction for ASR outputs thus becomes important in addition to improving recognition performance. In recent approaches to error correction, linguistic or domain information is used to generate the alternative hypotheses for the ASR outputs followed by the selection of the most likely alternative. In this study, the distances between ASR outputs and the potentially correct alternatives are estimated based on a weighted context-dependent syllable cluster-based kernel feature matrix followed by multidimensional scaling (MDS)-based distance rescaling. These distances are then used to construct an alternative syllable lattice and the dynamic programming is used to obtain the most likely correct output with respect to the original ASR results. Experiments show that the proposed method achieved about 1.95% improvement on the word error rate compared to the correction pair approach using the MATBN Mandarin Chinese broadcast news corpus.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122580341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信