{"title":"A Two-Stage Multi-Feature Integration Approach to Unsupervised Speaker Change Detection in Real-Time News Broadcasting","authors":"Lei Xie, Guangsen Wang","doi":"10.1109/CHINSL.2008.ECP.99","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.99","url":null,"abstract":"This paper presents a two-stage multi-feature integration approach for unsupervised speaker change detection in real-time news broadcasting. We integrate MFCC and LSP features (i.e. a perceptual feature plus a articulatory feature) in the metric-based potential speaker change detection stage to collect speaker boundary candidates as many as possible. We adopt a weighted Bayesian information criterion (BIC) to integrate boundary decisions from MFCC and LSP features in the speaker boundary confirmation stage. This multi-feature integration strategy makes use of the complementarity between perceptual features and articulatory features to achieve a performance gain. Speaker change detection experiments show that the multi- feature integration approach significantly outperforms the individual features with relative improvements of 26% over the LSP-only approach and 6% over the MFCC-only approach.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132646840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word Order Correction for Language Transfer Using Relative Position Language Modeling","authors":"Chao-Hong Liu, Chung-Hsien Wu, Matthew Harris","doi":"10.1109/CHINSL.2008.ECP.20","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.20","url":null,"abstract":"Sentence correction has been an important and emerging issue in computer-assisted language learning. However, existing techniques based on grammar rules or statistical machine translation are still not robust enough to tackle the common incorrect word order errors in sentences produced by second language learners of Chinese. In this paper, a novel relative position language model is proposed to address this problem, for which a corpus of erroneous English-Chinese language transfer sentences along with their corrected counterparts is created and manually judged by human annotators. Experimental results show that compared to a scoring approach based on an n-gram language model and a phrase-based machine translation system, the performance in terms of BLEU scores of the proposed approach achieved improvements of 20.3% and 26.5% for the correction of word order errors resulting from language transfer, respectively.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130905661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sample and Feature Selection Scheme for GMM-SVM Based Language Recognition","authors":"Yan Song, Lirong Dai","doi":"10.1109/CHINSL.2008.ECP.93","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.93","url":null,"abstract":"Discriminative training for language recognition has been a key tool for improving system performance. SVM-based algorithms (i.e. GMM-SVM, GLDS-SVM etc.) are important ones for language recognition. The core of these algorithms is to construct the kernel for comparing the similarity of two sequences. It is known that the mismatch between training and test condition will degrade the performance. In this paper, we proposed a novel sample and feature selection scheme under the GMM-SVM framework, which aims at alleviating the duration mismatch problem. The proposed method is evaluated on NIST 03 and 07 language recognition evaluation tasks with improvement over prior techniques.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126975108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Similarity Measure Between HMMS","authors":"Yih-Ru Wang","doi":"10.1109/CHINSL.2008.ECP.67","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.67","url":null,"abstract":"In this paper, a new similarity measure between HMM models which extended the well-known Kullback-Leibler distance was proposed. The Kullback-Leibler distance was defined as the mean of log-likelihood ratio (LLR) in a hypotheses test and the Kullback-Leibler distance was frequently used as a similarity measure for HMM models. Here, the standard deviation of LLR between HMM models was deviated first. Besides, the ratio of mean and standard variation of LLR was used as a new similarity measure between HMM models. Experiments were done in a Mandarin speech database, TCC-300, in order to check the effectiveness of the proposed similarity measure. The accuracy of the standard deviation of LLR estimated from the syllable HMM models was checked by comparison with the standard deviation of LLR of top-10 candidates found from HMM decoder. And, the confusion sets of 411 syllables were also found by using both the KL distance and the proposed similarity measure. Comparing to the top-10 confusion models, 94.9% and 95.3% inclusion rates can be achieved by using KL distance and the proposed similarity measure of HMM models.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115947339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient System Combination for Syllable-Confusion-Network-Based Chinese Spoken Term Detection","authors":"Jie Gao, Qingwei Zhao, Yonghong Yan, J. Shao","doi":"10.1109/CHINSL.2008.ECP.103","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.103","url":null,"abstract":"This paper examines the system combination issue for syllable-confusion-network (SCN)-based Chinese spoken term detection (STD). System combination for STD usually leads to improvements in accuracy but suffers from increased index size or complicated index structure. This paper explores methods for efficient combination of a word-based system and a syllable-based system while keeping the compactness of the indices. First, a composite SCN is generated using two approaches: lattice combination (The SCN is generated from a combined lattice) and confusion network combination (Two SCNs are combined into one). Then a simple compact index is constructed from this composite SCN by merging cross-system redundant information. The experimental result on a 60-hour corpus shows a relative accuracy improvement of 14.7% is achieved over the baseline syllable-based system. Meanwhile, it reduces the index size by 22.3% compared to the commonly adopted score combination method when achieves comparable accuracy.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115109923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Li, FengChai Liao, Ning Cheng, Bo Xu, Wenju Liu
{"title":"Microphone Array Post-Filter Based on Auditory Filtering","authors":"Peng Li, FengChai Liao, Ning Cheng, Bo Xu, Wenju Liu","doi":"10.1109/CHINSL.2008.ECP.105","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.105","url":null,"abstract":"In this paper, an auditory filtering based microphone array post-filter is proposed to enhance the quality of the output signal. By using a gammatone filterbank to band pass each input of the array, the input signals are decomposed into a two-dimensional T-F representation. Then, for each auditory filter channel, the post-filter's coefficients are estimated in each frame using the decomposed multi-channel input signals. Followed by the post-filtering and synthesis processing, the enhanced speech with better quality is acquired. Systematical evaluations on the CMU microphone array database prove that the proposed method could improve not only the noise reduction measure but also the speech quality measures.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114948969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mandarin Tone Perception with Temporal Envelope and Periodicity Cues from Different Frequency Regions","authors":"Meng Yuan, Tan Lee, S. Soli","doi":"10.1109/CHINSL.2008.ECP.96","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.96","url":null,"abstract":"Temporal envelope and periodicity cues (TEPC) are crucial for speech perception of hearing-impaired people who have poor frequency selectivity. This paper investigates the contributions of TEPCs extracted from different frequency regions to lexical tone perception of Mandarin. Tone identification tests were carried out with tone-contrasting monosyllabic and disyllabic words. Normal- hearing subjects were recruited in the psychoacoustic experiments with acoustic stimuli that simulate the output of a cochlear implant. The results show that tone identification accuracy with sub-band TEPCs is consistently higher for male voice than for female voice. TEPCs from sub-bands above 1 kHz are found to contribute more to tone identification than those from sub-bands below 1 kHz, especially for male voice. Tone recognition performance can be improved by simply removing the low-frequency TEPCs. The same findings were obtained in our previous study on Cantonese tone perception. This suggests that emphasizing high-frequency TEPCs may be an effective strategy to improve speech perception of tonal languages.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129894208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mandarin Learning Using Speech and Language Technologies: A Translation Game in the Travel Domain","authors":"Yushi Xu, S. Seneff","doi":"10.1109/CHINSL.2008.ECP.19","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.19","url":null,"abstract":"This paper describes a new Web-based translation game we have designed to help a student learn spoken Chinese. The student talks to the system in Chinese and the system compares the recognized sentence against a set of English prompts to judge whether it is a suitable translation of any one of them. The game can also provide translation assistance upon request. The game was developed using the IWSLT corpus of utterances in the tourist domain, and is oriented towards helping the student communicate effectively during foreign travel. In a preliminary evaluation, the system performed correctly on over 90% of test utterances. The system received positive feedback from the subjects.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127512796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heng Lu, Zhenhua Ling, Si Wei, Yu Hu, Lirong Dai, Ren-Hua Wang
{"title":"Heteronym Verification for Mandarin Speech Synthesis","authors":"Heng Lu, Zhenhua Ling, Si Wei, Yu Hu, Lirong Dai, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.46","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.46","url":null,"abstract":"Accurate phonetic transcription of speech corpus is critical to high quality speech synthesis. In Mandarin text-to-speech (MTTS) system, one major problem of automatically labeling the database is the heteronym annotation. Because in Mandarin, there are some single-character words or multi-character words have more than one pronunciation. In this paper, a heteronym annotation verification method for MTTS database labeling is proposed. By training contextual dependent HMMs and calculating the log likelihood ratio, each heteronym in the database is assigned a confidence score and those below the threshold are selected for manual inspecting. We divide heteronyms in Mandarin into two categories and different features are used for each category. The result of our experiment on an artificial test set has shown that we can achieve EER (equal error rate) of 7.9% and 11.9% for these two categories. Further test on an actual database which contains a total of 36098 heteronyms has shown that the proposed method can find 89 of all 123 annotation errors by only inspecting 639 polyphones.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122261392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Semi-Parametric Mean Trajectory Model Using Discriminatively Trained Centroids","authors":"Ran Xu, Jielin Pan, Yonghong Yan","doi":"10.1109/CHINSL.2008.ECP.63","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.63","url":null,"abstract":"In order to alleviate the limitation of \"state output probability conditional independence\" assumption held by Hidden Markov models (HMMs) in speech recognition, a discriminative semi-parametric trajectory model was proposed in recent years, in which both means and variances in the acoustic models are modeled as time-varying variables. The time- varying information is modeled as a weighted contribution from all the \"centroids\", which can be viewed as the representation of the acoustic space. In previous literatures, such centroids are often obtained by clustering the Gaussians in the baseline acoustic models to some reasonable number or by training a baseline model with fewer Gaussian components. The centroids obtained in this way are maximum likelihood estimation of the acoustic space, which are relatively weak in discriminability compared to the discriminatively trained acoustic models. In this paper, we proposed an improved semi-parametric mean trajectory model training framework, in which the centroids are first discriminatively trained by minimum phone error criterion to provide a more discriminative representation of the acoustic space. This method was evaluated on the Mandarin digit string recognition task. The experimental result shows that our proposed method improves the recognition performance by a relative string error rate reduction of 7.5% compared to the traditional discriminative semi-parametric trajectory model, and it outperforms the baseline acoustic model trained with maximum likelihood criterion by a relative string error rate reduction of 28.6%.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130443317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}