{"title":"Effects of phonemic vs allophonic density and stress on vowel-to-vowel coarticulation in Cantonese and Beijing Mandarin","authors":"P. Mok, S. Hawkins","doi":"10.1109/CHINSL.2004.1409579","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409579","url":null,"abstract":"Effects of phonemic versus allophonic vowel distribution, stress and direction of coarticulation on vowel-to-vowel (V-to-V) coarticulation were examined in Cantonese and Beijing Mandarin (BM). Cantonese has more vowel phonemes but BM has more allophones. Cantonese should show less V-to-V coarticulation than BM if phonemic contrast determines degree of V-to-V coarticulation. The vowels used were /iau/ in /pVpVpV/ structures. Phonemic vowel space density did not influence V-to-V coarticulation differentially in Cantonese and BM. Effects of stress and direction were not consistent. Generally, there was more carryover coarticulation, and more coarticulation on unstressed vowels, but exceptions were common. No one factor appears to determine patterns of V-to-V coarticulation in different languages. Other potential phonological influences are discussed.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127047244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A two-step keyword spotting method based on context-dependent a posteriori probability","authors":"T. Zheng, Jing Li, Zhanjiang Song, Mingxing Xu","doi":"10.1109/CHINSL.2004.1409641","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409641","url":null,"abstract":"Keyword weighting plays an important role in traditional keyword spotting (KWS) systems: it helps detect keyword candidates in an utterance so that they will not be missed. However, if the keywords are over-weighted, there will be a high number of false alarms, which will slow down the system and might introduce rejection errors; on the other hand, if the keywords are insufficiently weighted, the detection rate is not guaranteed. It is difficult to make a compromise with regard to keyword weighting. A two-step KWS method based on context-dependent a posteriori probability (CDAPP) is proposed in this paper as a way to solve this problem. The first step adopts a continuous speech recognition method, to generate a sequence of acoustic symbols for the second step, which performs a fuzzy keyword search. Preliminary experiments show that the proposed strategy is a promising one that needs additional investigation.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122555822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language identification through large vocabulary continuous speech recognition","authors":"Boon Pang Lim, Haizhou Li, Yu Chen","doi":"10.1109/CHINSL.2004.1409583","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409583","url":null,"abstract":"In recent years, automatic language identification has become an increasingly important component in practical spoken language systems, and much attention has been devoted to various competing approaches. In this paper, we are concerned with the automatic identification of languages that may be highly similar in nature, such as the various dialects of Chinese. Our approach differs from many recent successful systems by exploiting a fusion of feature scores readily available from a large vocabulary speech recognition system. We show that such features are able to distinguish among the similar sounding dialects of Chinese, and experiments on a nine language corpus show promising performance on a three way identification task.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"17 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129287447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of paraphrased corpus and lexical-based approach to Chinese paraphrasing","authors":"Yan Zhang, H. Kashioka","doi":"10.1109/CHINSL.2004.1409652","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409652","url":null,"abstract":"We firstly analyze the language phenomena and distribution characteristics of Chinese spontaneous utterances already paraphrased by other approaches. Based on the information obtained from a corpus, our lexical-based approach is proposed to paraphrase Chinese spoken language. Our purpose is to transform various expressions into simplified expressions with the same meanings. Chinese verbs are the main constituents in sentences, and with their flexibility they play an important role in expressing structures, especially for transitive verbs. Furthermore, negative verb expressions also appear frequently to express enquiries in question utterances. Therefore, we design four types of paraphrasing templates based on lexical information and the characteristics of the corpus: (1) synonym replacement; (2) Chinese transitive verbs; (3) verbs with two objects; (4) the transformation of negative expressions. Our experiment found that the lexical-based approach is effective for Chinese paraphrasing.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114794047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guo-Hong Ding, Bo Xu, Xia Wang, Yang Cao, Feng Ding, Yuezhong Tang
{"title":"Task-specific adaptation in Chinese name recognition","authors":"Guo-Hong Ding, Bo Xu, Xia Wang, Yang Cao, Feng Ding, Yuezhong Tang","doi":"10.1109/CHINSL.2004.1409636","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409636","url":null,"abstract":"In this paper, task-specific adaptation is proposed to improve Chinese name recognition performance. Since acoustic models are usually trained using large vocabulary continuous speech corpora, there exists distortion between modeling and decoding in name recognition. To compensate the mismatch, task-specific adaptation, which is performed in the MLLR framework with multi-regression classes, is proposed. Experimental results show that task-specific adaptation is very effective in Chinese name recognition to compensate the mismatch.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An acoustic-phonetic analysis of large vocabulary continuous Mandarin speech recognition for non-native speakers","authors":"Han Yang, Yuanyuan Pu, H. Wei, Zhengpeng Zhao","doi":"10.1109/CHINSL.2004.1409631","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409631","url":null,"abstract":"This paper addresses non-native accent issues in large vocabulary continuous speech recognition. We propose to analyze the transformation rules of non-native Mandarin speech spoken by native speakers of Naxi and Dai in Yunnan at the level of initials and finals. Firstly, baseline HMM models are trained using the project 863' standard Mandarin corpus to test their performance on non-native speech recognition. Secondly, the non-native speech data is transcribed, based on the baseline HMM models. In more detail, we analyze the error recognition rates of all initials and all finals, and their typical substitute error. The results obtained from our experiments might be useful for adapting a native speaker ASR system to model non-native accented data.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131028319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Dong, Qingwei Zhao, Jianping Zhang, Yonghong Yan
{"title":"Automatic assessment of pronunciation quality","authors":"Bin Dong, Qingwei Zhao, Jianping Zhang, Yonghong Yan","doi":"10.1109/CHINSL.2004.1409605","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409605","url":null,"abstract":"Learning to speak a foreign language is not an easy task for many people. This paper describes approaches to automatic objective assessment of pronunciation quality. The approaches described here can be classified into two categories, text-dependent and text-independent, according to whether a teacher's voice is present. In the text-independent one, algorithms based on energy and pitch contour are introduced. Also, the average rate of variation in energy and pitch frequency, mean subtracted energy and pitch frequency are used as main features. Compared to the previously reported approach using average phone segment posterior probabilities, the new approach achieves favorable performance on the same test set.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131223225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An acoustic and articulatory knowledge integrated method for improving synthetic Mandarin speech's fluency","authors":"H. Gu, Kuo-Hsian Wang","doi":"10.1109/CHINSL.2004.1409622","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409622","url":null,"abstract":"In synthetic Mandarin speech, discontinuity of formant traces at syllable boundaries is a key factor that lowers the fluency level. Therefore, we study an acoustic and articulatory knowledge integrated method to solve this discontinuity problem. First, representative trisyllable contexts are selected and their signals are recorded. The signal of the middle syllable of each trisyllable pronunciation is then extracted to make a synthesis unit. To select a synthesis unit among multiple candidates, a distance function is defined to measure the spectral similarity between two synthesis units to be concatenated. In addition, several linking-restriction rules are derived, according to articulatory knowledge, to prevent some synthesis units being linked into a sequence. Then, a globally best synthesis-unit sequence is searched by using a dynamic programming based algorithm. When this method is applied, the formant traces at syllable boundaries become smoother. Also, subjective evaluation shows that the fluency level of synthetic Mandarin speech can indeed be improved a lot.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature masking in an embedded Mandarin speech recognition system","authors":"Yuezhong Tang, Xia Wang, Yang Cao, Feng Ding","doi":"10.1109/CHINSL.2004.1409632","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409632","url":null,"abstract":"In this paper, we explored a feature component masking scheme for embedded tonal language recognition systems, in order to reduce the computational complexity with least degradation of recognition accuracy. We carried out a lot of experiments on a Mandarin isolated word recognition task with a tone-confusable vocabulary. With consideration of both clean and noisy conditions, we were able to find a masking scheme that filtered out 31 of 54 components and still outperformed the baseline with 54 components in the feature set, with dramatically less computational and memory complexity. The results showed that feature masking was a promising approach for complexity reduction in embedded tonal language recognition systems. The results also verified the effectiveness of higher order cepstral coefficients for tonal language recognition because most of them were preserved during the feature masking experiments.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116954741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rhythm correlation of speech synthesis system","authors":"J. Tao","doi":"10.1109/CHINSL.2004.1409626","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409626","url":null,"abstract":"There has been a rapid progress in speech synthesis; however, it is still hard to make a good objective evaluation of speech intonation while training the speech synthesis system. Unlike the traditional method, standard deviation of intonation, which normally makes the speech synthesis system sound smooth and flat, but with less expressiveness, the paper integrates the rhythm correlation in an evaluation based on tangential intonation. Furthermore, the paper makes a comparison among three typical evaluation methods: listening test; standard deviation of intonation; standard deviation of intonation and tangential intonation. It proves that the introduced method could generate better synthesis results than others with an even smaller training corpus.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129698237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}