2008 6th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Decision Fusion for Improving Mispronunciation Detection Using Language Transfer Knowledge and Phoneme-Dependent Pronunciation Scoring 基于语言迁移知识和音位依赖语音评分的决策融合改进发音错误检测
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.18
W. Lo, Alissa M. Harrison, H. Meng, Lan Wang
{"title":"Decision Fusion for Improving Mispronunciation Detection Using Language Transfer Knowledge and Phoneme-Dependent Pronunciation Scoring","authors":"W. Lo, Alissa M. Harrison, H. Meng, Lan Wang","doi":"10.1109/CHINSL.2008.ECP.18","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.18","url":null,"abstract":"Application of linguistic knowledge of language transfer to automatic speech recognition (ASR) technology can enhance mispronunciation detection performance in computer-aided pronunciation training (CAPT). This is achieved by pinpointing salient pronunciation errors made by second language learners. In this work, we propose to apply decision fusion for further improvement in mispronunciation detection performance. Detection decision from the linguistically-motivated detection, which applies language transfer knowledge, is used as the basis. Back off to posterior probability based pronunciation scoring with phoneme-dependent thresholds is employed when the basis is \"less-reliable\". Fusion can help combat problems such as incomplete coverage of linguistic knowledge as well as the imperfection of acoustic models in ASR. Our fusion strategy can maintain the diagnosis capability of the linguistically-motivated approach while achieve a major boost in detection performance. Experimental results show that decision fusion can achieve relative improvement in mispronunciation detection of up to 30% reduction in total number of decision errors.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131694708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition 鲁棒语音识别中mel频率倒频谱最小均方误差噪声抑制器的改进
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.29
Dong Yu, L. Deng, Jian Wu, Y. Gong, A. Acero
{"title":"Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition","authors":"Dong Yu, L. Deng, Jian Wu, Y. Gong, A. Acero","doi":"10.1109/CHINSL.2008.ECP.29","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.29","url":null,"abstract":"Recently we have developed a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Our novel algorithm operates on the power spectral magnitude of the filter-bank's outputs and outperforms the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah in both recognition accuracy and efficiency as demonstrated on the Aurora-3 corpora. This paper serves two purposes. First, we show that the algorithm is effective on large vocabulary tasks with tri-phone acoustic models. Second, we report improvements on the suppression rule of the original MFCC-MMSE noise suppressor by smoothing the gain over the previous frames to prevent the abrupt change of the gain over frames and adjusting gain function based on the noise power so that the suppression is aggressive when the noise level is high and conservative when the noise level is low. We also propose an efficient and effective parameter tuning algorithm named step-adaptive discriminative learning algorithm (SADLA) to adjust the parameters used by the noise tracker and the suppressor. We observed a 46% relative word error (WER) reduction on an in-house large-vocabulary noisy speech database with a clean trained model, which translates into a 16% relative WER reduction over the original MFCC-MMSE noise suppressor, and 6% relative WER reduction on the Aurora-3 corpora over our original MFCC-MMSE algorithm or 30% relative WER reduction over the CMN baseline.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123217728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Pitch Analysis of Imperative Sentences in Standard Chinese 普通话祈使句的音高分析
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.78
Jia Sun, J. Lu, Ai-jun Li, Yuan Jia
{"title":"The Pitch Analysis of Imperative Sentences in Standard Chinese","authors":"Jia Sun, J. Lu, Ai-jun Li, Yuan Jia","doi":"10.1109/CHINSL.2008.ECP.78","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.78","url":null,"abstract":"The present study investigates the intonational pattern of imperative sentence, especially those having intensive mood, such as ordering and forbidding in Standard Chinese. Grouping the sentences by length and focusing on the fundamental frequency, this paper tries to provide a description of pitch patterns of Chinese strong imperatives. Comparing to the declarative sentence, the pitch contour of the imperative sentence with strong mood is wholly raised, where the sentence stress rises more seriously, and the pitch range is compressed. The raising phenomenon has nothing to do with tonal differences or length of the sentence. The strong mood even changes the third tone to a rising tone when it is at the sentence final or in a one syllable sentence.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123656376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using Reference to Tune Language Model for Detection of Reading Miscues 利用参考调校语言模型检测阅读错误
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.87
Changliang Liu, Fuping Pan, Fengpei Ge, Bin Dong, Yonghong Yan
{"title":"Using Reference to Tune Language Model for Detection of Reading Miscues","authors":"Changliang Liu, Fuping Pan, Fengpei Ge, Bin Dong, Yonghong Yan","doi":"10.1109/CHINSL.2008.ECP.87","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.87","url":null,"abstract":"For a reading tutor, the reference content which the reader reads is known beforehand. This apriori information is very important in automatic detection of reading miscues. This paper proposed two methods to incorporate the reference information into LVCSR framework to improve the performance of miscue detection. The two methods both tune the n-gram Language Model (LM) probabilities dynamically in the decoding process based on the analysis of current reference sentence. The first method weighs the LM probability directly if current n-gram exists in the reference, and the second method takes a liner combination of the original LM probability and the reference probability. The experiments on a Chinese Mandarin reading corpus proved the effectiveness of both methods. The detection error rate and false alarm rate are decreased by 33.1 % and 35.5% respectively for the best method.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129442124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map 基于线性回归和映射的不同判别训练准则的适应性研究
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.35
Bo Zhu, Zhijie Yan, Yu Hu, Zhiguo Wang, Lirong Dai, Ren-Hua Wang
{"title":"Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map","authors":"Bo Zhu, Zhijie Yan, Yu Hu, Zhiguo Wang, Lirong Dai, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.35","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.35","url":null,"abstract":"This paper presents a comparison and evaluation between the conventional maximum likelihood estimation based adaptation and different discriminative adaptation criteria. The performance of different LR and MAP adaptation are compared respectively, and the strategies of first applying LR then MAP based on both MLE and DT criteria are evaluated. The effect of the amount of available data for adaptation is also compared in our experiments. The experiment results of 863 and Tsinghua mandarin evaluation tasks suggests that the process of first applying MWCE-LR then MWCE-MAP can achieve the best performance.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129990259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Two-Stage Algorithm for Multi-Speaker Identification System 多说话人识别系统的两阶段算法
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.52
Yong Guan, Wenju Liu
{"title":"A Two-Stage Algorithm for Multi-Speaker Identification System","authors":"Yong Guan, Wenju Liu","doi":"10.1109/CHINSL.2008.ECP.52","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.52","url":null,"abstract":"In this paper, a two-stage multi-speaker identification (SID) system is proposed for mixed speeches with multiple speakers speaking simultaneously. By investigating the second stage processing, we improved the performance of multi-speaker SID from 94.6% to 99.0% on a standard testing set, and comparing with another state-of-art system, the proposed results were also a little better. We also examined the configure parameters of proposed algorithm, and found that the gain compensation parameter and composition model were crucial for multi-speaker SID. Also, the likelihood constrained parameter was an important improvement compared with conventional SID.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125415363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Improvement for Training Efficiency of Semi-Tied Covariance 半捆绑协方差训练效率的改进
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.62
Sibao Chen, Yu Hu, B. Luo, Ren-Hua Wang
{"title":"An Improvement for Training Efficiency of Semi-Tied Covariance","authors":"Sibao Chen, Yu Hu, B. Luo, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.62","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.62","url":null,"abstract":"Semi-tied covariance (STC) is applied widely in speech recognition due to its feature de-correlation ability. Solving the transform matrices of STC is a nonlinear optimization problem. Gales proposed an efficient method by iteratively updating a row of transform matrices. However, it needs to solve cofactors of elements of a matrix row in two layers of loops. Directly solving them is very time-consuming. Based on the property that only one row is updated in each iteration, it can be found from algebraic procedures, that the inverse and determinant of transform matrix in current iteration can be obtained by simple multiplications and additions of those in the previous iteration, and the cofactor vector of a row is equal to the corresponding column of multiplication between the inverse and determinant. This clearly improves the training efficiency of STC. Experiments on the RM database show that the proposed iteration method achieves a 33.56% relative reduction of training time over original STC method.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"252 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114098789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Perceptual Study of Approximated Cantonese Tone Contours 近似粤语声调轮廓的感知研究
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.24
Yujia Li, Tan Lee
{"title":"A Perceptual Study of Approximated Cantonese Tone Contours","authors":"Yujia Li, Tan Lee","doi":"10.1109/CHINSL.2008.ECP.24","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.24","url":null,"abstract":"This paper describes a perceptual study on approximated Cantonese tone contours. It is found that Cantonese tone contours and tone transitions can be approximated by a limited number of linear movements, without creating any noticeable perceptual difference. The slopes of these linear movements are analyzed. They are found to be related with two thresholds of pitch movement perception. The results of perceptual tests with polysyllabic words over large segmental variation confirm the feasibility of approximating F0 contours of Cantonese speech.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124948201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Automatic Evaluation of Mandarin Pronunciation with Speaker Adaptive Training (SAT) and MLLR Speaker Adaption 用说话者自适应训练(SAT)和MLLR说话者自适应改进普通话语音自动评价
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.21
Chao Huang, Feng Zhang, F. Soong
{"title":"Improving Automatic Evaluation of Mandarin Pronunciation with Speaker Adaptive Training (SAT) and MLLR Speaker Adaption","authors":"Chao Huang, Feng Zhang, F. Soong","doi":"10.1109/CHINSL.2008.ECP.21","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.21","url":null,"abstract":"Automatic pronunciation evaluation (APE) can be implemented with a speech recognition model trained by standard, \"golden\" speakers. The pronunciation accuracy is then measured with the Goodness of Pronunciation (GOP) as reported in our earlier work [1]. In this paper, we investigate two main strategies for improving the evaluation: speaker adaptive training (SAT) for reducing the speaker-specific characteristics in model training and MLLR-based speaker adaptation in evaluation for reducing mismatch between the trained model and a testing speaker. Overall, the proposed strategies improve the correlation between evaluations made by APE and human experts from 0.69 to 0.76, approaching the upper bound value of 0.78 among human expert evaluators. Additionally, APE also shows a consistency of 0.93 better than the consistency of 0.83 among human experts.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132674616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Synchronous Method for Automatic Scoring of Language Learning 一种语言学习自动评分的同步方法
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.86
Bin Dong, Yonghong Yan
{"title":"A Synchronous Method for Automatic Scoring of Language Learning","authors":"Bin Dong, Yonghong Yan","doi":"10.1109/CHINSL.2008.ECP.86","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.86","url":null,"abstract":"In this paper, a synchronous method based on state graph is proposed to calculate the evaluation feature for automatic scoring in computer-assisted language learning (CALL). The posterior probabilities of states are selected as the main feature. The score of hypothesized phonemes and words are estimated using the information of corresponding states. Traditional systems use two passes and two different models for decoding and computing posterior probabilities respectively. In this new algorithm, the posterior probabilities are calculated during the decoding of the state graph constructed from grammar. And in this new algorithm, the same acoustics model is used during the process of decoding and posterior probabilities computing. The old and new computing algorithms are compared through experiments, and the result shows that performance of the new algorithm is effectively improved. The scoring accuracy of new synchronous algorithm is improved, while the computing complexity reduces 16%.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117083687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信