2012 8th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Synthesized stereo-based stochastic mapping with data selection for robust speech recognition 基于数据选择的合成立体随机映射鲁棒语音识别
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423542
Jun Du, Qiang Huo
{"title":"Synthesized stereo-based stochastic mapping with data selection for robust speech recognition","authors":"Jun Du, Qiang Huo","doi":"10.1109/ISCSLP.2012.6423542","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423542","url":null,"abstract":"In this paper, we present a synthesized stereo-based stochastic mapping approach for robust speech recognition. We extend the traditional stereo-based stochastic mapping (SSM) in two main aspects. First, the constraint of stereo-data, which is not practical in real applications, is relaxed by using HMM-based speech synthesis. Then we make feature mapping more focused on those incorrectly recognized samples via a data selection strategy. Experimental results on Aurora3 databases show that our approach can achieve consistently significant improvements of recognition performance in the well-matched (WM) condition among four different European languages.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134163141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The lossless adaptive arithmetic coding based on context for ITU-T G.719 at variable rate 基于上下文的ITU-T G.719可变速率无损自适应算术编码
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423462
Xuan Ji, Jing Wang, Hailong He, Jingming Kuang
{"title":"The lossless adaptive arithmetic coding based on context for ITU-T G.719 at variable rate","authors":"Xuan Ji, Jing Wang, Hailong He, Jingming Kuang","doi":"10.1109/ISCSLP.2012.6423462","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423462","url":null,"abstract":"This paper presents a novel technique of context-based adaptive arithmetic coding of the quantized MDCT coefficients and frequency band gains in audio compression. A key feature of the new technique is combining the context model in time domain and frequency domain, which used for the quantized norms' and MDCT coefficients' probability. With this new technique, we achieve a high degree of adaptation and redundancy reduction in the adaptive arithmetic coding. In addition, we employ an efficient variable rate algorithm for G.719. The variable rate algorithm is designed based on the baseline entropy coding method of G.719 and the proposed adaptive arithmetic coding technique respectively. For a set of audio samples used in the application, we achieve an average bit-rate saving of 7.2% while producing audio quality equal to that of the original G.719.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134400787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured modeling based on generalized variable parameter HMMs and speaker adaptation 基于广义变参数hmm和说话人自适应的结构化建模
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423526
Yang Li, Xunying Liu, Lan Wang
{"title":"Structured modeling based on generalized variable parameter HMMs and speaker adaptation","authors":"Yang Li, Xunying Liu, Lan Wang","doi":"10.1109/ISCSLP.2012.6423526","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423526","url":null,"abstract":"It is a challenging task that to handle ambient variable acoustic factors in automatic speech recognition (ASR) system. The ambient variable noise and the distinct acoustic factors among speakers are two key issues for recognition task. To solve these problems, we present a new framework for robust speech recognition based on structured modeling, using generalized variable parameter HMMs (GVP-HMMs) and unsupervised speaker adaptation (SA) to compensate the mismatch from environment and speaker variability. GVP-HMMs can explicitly approximate the continuous trajectory of Gaussian component mean, variance and linear transformation parameter with a polynomial function against the varying noise level. In recognition stage, MLLR transform captures general relationship between the original model set and the current speaker, which could help in removing the effects of unwanted speaker factors. The effectiveness of the proposed approach is confirmed by evaluation experiment on a medium vocabulary Mandarin recognition task.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116544351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An improved tone labeling and prediction method with non-uniform segmentation of F0 contour 一种改进的F0轮廓非均匀分割的音调标记与预测方法
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423467
Xingyu Na, Xiang Xie, Jingming Kuang, Yaling He
{"title":"An improved tone labeling and prediction method with non-uniform segmentation of F0 contour","authors":"Xingyu Na, Xiang Xie, Jingming Kuang, Yaling He","doi":"10.1109/ISCSLP.2012.6423467","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423467","url":null,"abstract":"This paper proposes a tone labeling technique for tonal language speech synthesis. Non-uniform segmentation using Viterbi alignment is introduced to determine the boundaries to get F0 symbols, which are used as tonal label to eliminate the mismatch between tone patterns and F0 contours of training data. During context clustering, the tendency of adjacent F0 state distributions are captured by the state-based phonetic trees. Means of tone model states are directly quantized to get full tonal label in the synthesis stage. Both objective and subjective experiment results show that the proposed technique can improve the perceptual prosody of synthetic speech of non-professional speakers.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126275026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context dependant phone mapping for cross-lingual acoustic modeling 上下文依赖电话映射跨语言声学建模
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423496
Van Hai Do, Xiong Xiao, Chng Eng Siong, Haizhou Li
{"title":"Context dependant phone mapping for cross-lingual acoustic modeling","authors":"Van Hai Do, Xiong Xiao, Chng Eng Siong, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423496","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423496","url":null,"abstract":"This paper presents a novel method for acoustic modeling with limited training data. The idea is to leverage on a well-trained acoustic model of a source language. In this paper, a conventional HMM/GMM triphone acoustic model of the source language is used to derive likelihood scores for each feature vector of the target language. These scores are then mapped to triphones of the target language using neural networks. We conduct a case study where Malay is the source language while English (Aurora-4 task) is the target language. Experimental results on the Aurora-4 (clean test set) show that by using only 7, 16, and 55 minutes of English training data, we achieve 21.58%, 17.97%, and 12.93% word error rate, respectively. These results outperform the conventional HMM/GMM and hybrid systems significantly.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Text-Dependent Speaker Recognition with long-term features based on functional data analysis 基于功能数据分析的具有长期特征的文本依赖说话人识别
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423461
Chenhao Zhang, T. Zheng, Ruxin Chen
{"title":"Text-Dependent Speaker Recognition with long-term features based on functional data analysis","authors":"Chenhao Zhang, T. Zheng, Ruxin Chen","doi":"10.1109/ISCSLP.2012.6423461","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423461","url":null,"abstract":"Text-Dependent Speaker Recognition (TDSR) is widely used nowadays. The short-term features like Mel-Frequency Cepstral Coefficient (MFCC) have been the dominant features used in traditional Dynamic Time Warping (DTW) based TDSR systems. The short-term features capture better local portion of the significant temporal dynamics but worse in overall sentence statistical characteristics. Functional Data Analysis (FDA) has been proven to show significant advantage in exploring the statistic information of data, so in this paper, a long-term feature extraction based on MFCC and FDA theory is proposed, where the extraction procedure consists of the following steps: Firstly, the FDA theory is applied after the MFCC feature extraction; Secondly, for the purpose of compressing the redundant data information, new feature based on the Functional Principle Component Analysis (FPCA) is generated; Thirdly, the distance between train features and test features is calculated for the use of the recognition procedure. Compared with the existing MFCC plus DTW method, experimental results show that the new features extracted with the proposed method plus the cosine similarity measure demonstrates better performance.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116304579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved steady segment based decoding algorithm by using response probability for LVCSR 基于响应概率的LVCSR稳定段译码改进算法
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423525
Zhanlei Yang, Wenju Liu, Hao Chao
{"title":"An improved steady segment based decoding algorithm by using response probability for LVCSR","authors":"Zhanlei Yang, Wenju Liu, Hao Chao","doi":"10.1109/ISCSLP.2012.6423525","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423525","url":null,"abstract":"This paper proposes a novel decoding algorithm by integrating both steady speech segments and observations' location information into conventional path extension framework. First, speech segments which possess stable spectrum are extracted. Second, a preliminarily improved algorithm is given by modifying traditional inter-HMM extension framework using the detected steady segments. Then, at probability calculation stage, response probability (RP), which represents location information of observations within acoustic feature space, is further incorporated into decoding. Thus, RP directs the decoder to enhance/weaken path candidates that get through the front end steady-segment-based decoding. Experiments conducted on Mandarin speech recognition show that character error rate of proposed algorithm achieves a 4.6% relative reduction when compared with a system in which only steady segment is used, and run time factor achieves a 10.0% relative reduction when compared with a system in which only RP is used.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124044020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A simple and effective pitch re-estimation method for rich prosody and speaking styles in HMM-based speech synthesis 基于hmm的语音合成中丰富韵律和说话风格的一种简单有效的音高重估计方法
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423473
Chengyuan Lin, Chien-Hung Huang, C. Kuo
{"title":"A simple and effective pitch re-estimation method for rich prosody and speaking styles in HMM-based speech synthesis","authors":"Chengyuan Lin, Chien-Hung Huang, C. Kuo","doi":"10.1109/ISCSLP.2012.6423473","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423473","url":null,"abstract":"This paper proposes a novel way of controllable pitch re-estimation that can produce better pitch contour or provide diverse speaking styles for text-to-speech (TTS) systems. The method is composed of a pitch re-estimation model and a set of control parameters. The pitch re-estimation model is employed to reduce over-smoothing effects which is usually introduced by TTS training. The control parameters are designed to generate not only rich intonations but also speaking styles, e.g. a foreign accent or an excited tone. To verify the feasibility of the proposed method, we conducted experiments for both objective measures and subjective tests. Although the re-estimated pitch results in only slightly less prediction error for objective measure, it produces clearly better intonation for listening test. Moreover, the expressive speech can be generated successfully under the framework of controllable pitch re-estimation.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127405149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
mENUNCIATE: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development mENUNCIATE:开发一个基于跨平台框架的计算机辅助发音训练系统,用于移动语音应用程序开发
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423507
Pengfei Liu, K. Yuen, Wai-Kim Leung, H. Meng
{"title":"mENUNCIATE: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development","authors":"Pengfei Liu, K. Yuen, Wai-Kim Leung, H. Meng","doi":"10.1109/ISCSLP.2012.6423507","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423507","url":null,"abstract":"This paper presents our ongoing research in the field of speech-enabled multimodal, mobile application development. We have developed a multimodal framework that enables cross-platform development using open standards-based HTML, CSS and JavaScript. This framework brings high extendibility through plugin-based architecture and provides scalable REST-based speech services in the cloud to support large amounts of requests from mobile devices. This paper describes the architecture and implementation of the framework, and the development of a mobile computer-aided pronunciation training application for Chinese learners of English, named mENUNCIATE, based on this framework. We also report a preliminary performance evaluation on mENUNCIATE.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133723654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Enhanced lengthening cancellation using bidirectional pitch similarity alignment for spontaneous speech 利用双向音高相似性对齐增强的自发语音延长消除
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423517
Po-Yi Shih, Bo-Wei Chen, Jhing-Fa Wang, Jhing-Wei Wu
{"title":"Enhanced lengthening cancellation using bidirectional pitch similarity alignment for spontaneous speech","authors":"Po-Yi Shih, Bo-Wei Chen, Jhing-Fa Wang, Jhing-Wei Wu","doi":"10.1109/ISCSLP.2012.6423517","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423517","url":null,"abstract":"In this work, an enhanced lengthening cancellation method is proposed to detect and cancel the lengthening part of vowels. The proposed method consists of autocorrelation function, cosine similarity-based lengthening detection and bidirectional pitch contour alignment. Autocorrelation function is used to obtain the reference pitch contour. Cosine similarity-based method is applied to measure the similarity between the reference and the next adjacent pitch contours. Due to the variant lengths of periodic segments, fixed size frames may cause accumulative errors. Therefore, bidirectional pitch contour alignment is adopted in this study. Experiments indicate that the proposed method can achieve an accuracy rate of 91.4% and 88.7% on a 60-keyword and 50-scentence database, respectively. Moreover, the proposed approach performs about three times speed than the baseline. Such results prove the effectiveness of the proposed method.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"445 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133780486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信