{"title":"Decision Fusion for Improving Mispronunciation Detection Using Language Transfer Knowledge and Phoneme-Dependent Pronunciation Scoring","authors":"W. Lo, Alissa M. Harrison, H. Meng, Lan Wang","doi":"10.1109/CHINSL.2008.ECP.18","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.18","url":null,"abstract":"Application of linguistic knowledge of language transfer to automatic speech recognition (ASR) technology can enhance mispronunciation detection performance in computer-aided pronunciation training (CAPT). This is achieved by pinpointing salient pronunciation errors made by second language learners. In this work, we propose to apply decision fusion for further improvement in mispronunciation detection performance. Detection decision from the linguistically-motivated detection, which applies language transfer knowledge, is used as the basis. Back off to posterior probability based pronunciation scoring with phoneme-dependent thresholds is employed when the basis is \"less-reliable\". Fusion can help combat problems such as incomplete coverage of linguistic knowledge as well as the imperfection of acoustic models in ASR. Our fusion strategy can maintain the diagnosis capability of the linguistically-motivated approach while achieve a major boost in detection performance. Experimental results show that decision fusion can achieve relative improvement in mispronunciation detection of up to 30% reduction in total number of decision errors.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131694708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements on Mel-Frequency Cepstrum Minimum-Mean-Square-Error Noise Suppressor for Robust Speech Recognition","authors":"Dong Yu, L. Deng, Jian Wu, Y. Gong, A. Acero","doi":"10.1109/CHINSL.2008.ECP.29","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.29","url":null,"abstract":"Recently we have developed a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Our novel algorithm operates on the power spectral magnitude of the filter-bank's outputs and outperforms the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah in both recognition accuracy and efficiency as demonstrated on the Aurora-3 corpora. This paper serves two purposes. First, we show that the algorithm is effective on large vocabulary tasks with tri-phone acoustic models. Second, we report improvements on the suppression rule of the original MFCC-MMSE noise suppressor by smoothing the gain over the previous frames to prevent the abrupt change of the gain over frames and adjusting gain function based on the noise power so that the suppression is aggressive when the noise level is high and conservative when the noise level is low. We also propose an efficient and effective parameter tuning algorithm named step-adaptive discriminative learning algorithm (SADLA) to adjust the parameters used by the noise tracker and the suppressor. We observed a 46% relative word error (WER) reduction on an in-house large-vocabulary noisy speech database with a clean trained model, which translates into a 16% relative WER reduction over the original MFCC-MMSE noise suppressor, and 6% relative WER reduction on the Aurora-3 corpora over our original MFCC-MMSE algorithm or 30% relative WER reduction over the CMN baseline.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123217728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Pitch Analysis of Imperative Sentences in Standard Chinese","authors":"Jia Sun, J. Lu, Ai-jun Li, Yuan Jia","doi":"10.1109/CHINSL.2008.ECP.78","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.78","url":null,"abstract":"The present study investigates the intonational pattern of imperative sentence, especially those having intensive mood, such as ordering and forbidding in Standard Chinese. Grouping the sentences by length and focusing on the fundamental frequency, this paper tries to provide a description of pitch patterns of Chinese strong imperatives. Comparing to the declarative sentence, the pitch contour of the imperative sentence with strong mood is wholly raised, where the sentence stress rises more seriously, and the pitch range is compressed. The raising phenomenon has nothing to do with tonal differences or length of the sentence. The strong mood even changes the third tone to a rising tone when it is at the sentence final or in a one syllable sentence.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123656376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changliang Liu, Fuping Pan, Fengpei Ge, Bin Dong, Yonghong Yan
{"title":"Using Reference to Tune Language Model for Detection of Reading Miscues","authors":"Changliang Liu, Fuping Pan, Fengpei Ge, Bin Dong, Yonghong Yan","doi":"10.1109/CHINSL.2008.ECP.87","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.87","url":null,"abstract":"For a reading tutor, the reference content which the reader reads is known beforehand. This apriori information is very important in automatic detection of reading miscues. This paper proposed two methods to incorporate the reference information into LVCSR framework to improve the performance of miscue detection. The two methods both tune the n-gram Language Model (LM) probabilities dynamically in the decoding process based on the analysis of current reference sentence. The first method weighs the LM probability directly if current n-gram exists in the reference, and the second method takes a liner combination of the original LM probability and the reference probability. The experiments on a Chinese Mandarin reading corpus proved the effectiveness of both methods. The detection error rate and false alarm rate are decreased by 33.1 % and 35.5% respectively for the best method.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129442124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Zhu, Zhijie Yan, Yu Hu, Zhiguo Wang, Lirong Dai, Ren-Hua Wang
{"title":"Investigation on Adaptation Using Different Discriminative Training Criteria Based Linear Regression and Map","authors":"Bo Zhu, Zhijie Yan, Yu Hu, Zhiguo Wang, Lirong Dai, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.35","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.35","url":null,"abstract":"This paper presents a comparison and evaluation between the conventional maximum likelihood estimation based adaptation and different discriminative adaptation criteria. The performance of different LR and MAP adaptation are compared respectively, and the strategies of first applying LR then MAP based on both MLE and DT criteria are evaluated. The effect of the amount of available data for adaptation is also compared in our experiments. The experiment results of 863 and Tsinghua mandarin evaluation tasks suggests that the process of first applying MWCE-LR then MWCE-MAP can achieve the best performance.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129990259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Stage Algorithm for Multi-Speaker Identification System","authors":"Yong Guan, Wenju Liu","doi":"10.1109/CHINSL.2008.ECP.52","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.52","url":null,"abstract":"In this paper, a two-stage multi-speaker identification (SID) system is proposed for mixed speeches with multiple speakers speaking simultaneously. By investigating the second stage processing, we improved the performance of multi-speaker SID from 94.6% to 99.0% on a standard testing set, and comparing with another state-of-art system, the proposed results were also a little better. We also examined the configure parameters of proposed algorithm, and found that the gain compensation parameter and composition model were crucial for multi-speaker SID. Also, the likelihood constrained parameter was an important improvement compared with conventional SID.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125415363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improvement for Training Efficiency of Semi-Tied Covariance","authors":"Sibao Chen, Yu Hu, B. Luo, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.62","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.62","url":null,"abstract":"Semi-tied covariance (STC) is applied widely in speech recognition due to its feature de-correlation ability. Solving the transform matrices of STC is a nonlinear optimization problem. Gales proposed an efficient method by iteratively updating a row of transform matrices. However, it needs to solve cofactors of elements of a matrix row in two layers of loops. Directly solving them is very time-consuming. Based on the property that only one row is updated in each iteration, it can be found from algebraic procedures, that the inverse and determinant of transform matrix in current iteration can be obtained by simple multiplications and additions of those in the previous iteration, and the cofactor vector of a row is equal to the corresponding column of multiplication between the inverse and determinant. This clearly improves the training efficiency of STC. Experiments on the RM database show that the proposed iteration method achieves a 33.56% relative reduction of training time over original STC method.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"252 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114098789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Perceptual Study of Approximated Cantonese Tone Contours","authors":"Yujia Li, Tan Lee","doi":"10.1109/CHINSL.2008.ECP.24","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.24","url":null,"abstract":"This paper describes a perceptual study on approximated Cantonese tone contours. It is found that Cantonese tone contours and tone transitions can be approximated by a limited number of linear movements, without creating any noticeable perceptual difference. The slopes of these linear movements are analyzed. They are found to be related with two thresholds of pitch movement perception. The results of perceptual tests with polysyllabic words over large segmental variation confirm the feasibility of approximating F0 contours of Cantonese speech.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124948201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Automatic Evaluation of Mandarin Pronunciation with Speaker Adaptive Training (SAT) and MLLR Speaker Adaption","authors":"Chao Huang, Feng Zhang, F. Soong","doi":"10.1109/CHINSL.2008.ECP.21","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.21","url":null,"abstract":"Automatic pronunciation evaluation (APE) can be implemented with a speech recognition model trained by standard, \"golden\" speakers. The pronunciation accuracy is then measured with the Goodness of Pronunciation (GOP) as reported in our earlier work [1]. In this paper, we investigate two main strategies for improving the evaluation: speaker adaptive training (SAT) for reducing the speaker-specific characteristics in model training and MLLR-based speaker adaptation in evaluation for reducing mismatch between the trained model and a testing speaker. Overall, the proposed strategies improve the correlation between evaluations made by APE and human experts from 0.69 to 0.76, approaching the upper bound value of 0.78 among human expert evaluators. Additionally, APE also shows a consistency of 0.93 better than the consistency of 0.83 among human experts.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132674616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Synchronous Method for Automatic Scoring of Language Learning","authors":"Bin Dong, Yonghong Yan","doi":"10.1109/CHINSL.2008.ECP.86","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.86","url":null,"abstract":"In this paper, a synchronous method based on state graph is proposed to calculate the evaluation feature for automatic scoring in computer-assisted language learning (CALL). The posterior probabilities of states are selected as the main feature. The score of hypothesized phonemes and words are estimated using the information of corresponding states. Traditional systems use two passes and two different models for decoding and computing posterior probabilities respectively. In this new algorithm, the posterior probabilities are calculated during the decoding of the state graph constructed from grammar. And in this new algorithm, the same acoustics model is used during the process of decoding and posterior probabilities computing. The old and new computing algorithms are compared through experiments, and the result shows that performance of the new algorithm is effectively improved. The scoring accuracy of new synchronous algorithm is improved, while the computing complexity reduces 16%.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117083687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}