{"title":"On noise robustness of dynamic and static features for continuous Cantonese digit recognition","authors":"Chen Yang, F. Soong, Tan Lee","doi":"10.1109/CHINSL.2004.1409640","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409640","url":null,"abstract":"It has been shown previously that augmented spectral features (static and dynamic cepstra) are effective for improving ASR performance in a clean environment. In this paper we investigate the noise robustness of static and dynamic cepstral features, in a speaker independent, continuous recognition task by using a noise-added, Cantonese digit database (CUDigit). We found that the dynamic cepstrum is more robust to additive, background noise than its static counterpart. The results are consistent across different types of noise and under various SNR. Exponential weights which can exploit the unequal robustness of two features are optimally trained in a development set. A relative word error rate reduction of 41.9%, mainly on a significant reduction of insertions, is obtained on the test data under various noise and SNR conditions.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127246867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis and synthesis of Cantonese F/sub 0/ contours based on the command-response model","authors":"Wentao Gu, K. Hirose, H. Fujisaki","doi":"10.1109/CHINSL.2004.1409617","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409617","url":null,"abstract":"Cantonese is a well-known Chinese dialect with a quite complex tone system. We have applied the command-response model to represent F/sub 0/ contours of Cantonese speech by defining a set of appropriate tone command patterns. In this paper, the analysis is extended to Cantonese utterances at three different speech rates. By incorporating the effects of tone coarticulation, word accentuation and phrase intonation, the model gives high accuracy of approximations to Cantonese speech F/sub 0/ contours, and hence provides a much better means to quantitatively describe the F/sub 0/ contours than the traditional 5-level tone code system. The distributions of timing and amplitudes of commands are investigated, based on which a set of rules is used for synthesis of Cantonese F/sub 0/ contours. The validity of the current approach is confirmed by perceptual evaluation of Cantonese synthetic speech.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131322954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of direct modeling in natural language generation for Chinese and English translation","authors":"Fu-hua Liu, Yuqing Gao","doi":"10.1109/CHINSL.2004.1409650","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409650","url":null,"abstract":"This paper proposes a new direct-modeling-based approach to improve the maximum entropy based natural language generation (NLG) in the IBM MASTOR system, an interlingua-based speech translation system. Due to the intrinsic disparity between Chinese and English sentences, the previous method employed only linguistic constituents from output language sentences to train the NLG model. The new algorithm exploits a direct-modeling scheme to admit linguistic constituent information from both source and target languages into the training process seamlessly when incorporating a concept padding scheme. When concept sequences from the top level of semantic parse trees are considered, the concept error rate (CER) is significantly reduced to 14.3%, compared to 23.9% in the baseline NLG. Similarly, when concept sequences from all levels of semantic parse trees are tested, the direct-modeling scheme yields a CER of 10.8% compared to 17.8% in the baseline. A sensible improvement on the overall translation is made when the direct-modeling scheme improves the BLEU score from 0.252 to 0.294.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115310358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling natural computing","authors":"Xuedong Huang","doi":"10.1109/CHINSL.2004.1409565","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409565","url":null,"abstract":"Summary form only given. We are entering the third generation of the human computer interface. In contrast to the first and second generations where human users has to learn the arcane command languages or graphical icons to operate computers in the ways the computers were designed, the third generation interface will allow the users to express their intents naturally by shifting the burden of understanding what it takes to interact from the human to the computer. Natural computing will be mainstream in the near future that could dramatically improve the quality of our daily lives. Spoken language technologies play a central role for natural computing. Spoken language is the modality that can offer a consistent means of interaction for a variety of computer form factors across a wide range of hands free, eyes free environments. Technology advancements in this area have made impressive progress so that the prevalence of the spoken language interface is no longer a question of \"whether\" but \"when\". In this paper, the author summarizes the recent progress of industry and academia in bringing natural computing to the mass market.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127796704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Two-Layer Approach for Spoken Language Translation","authors":"Jhing-Fa Wang, Shun-Chieh Lin, Hsueh-Wei Yang","doi":"10.1109/CHINSL.2004.1409651","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409651","url":null,"abstract":"The paper proposes a new two-layer approach for spoken language translation. First, we develop translated examples and transform them into speech signals. Second, to retrieve a translated example properly by analyzing speech signals, we expand the translated example into two layers: an intention layer and an object layer. The intention layer is used to examine intention similarity between the speech input and the translated example. The object layer is used to identify the objective components of the examined intention. Experiments were conducted with the languages of Chinese and English. The results revealed that our proposed approach achieves about 86% and 76% understandable translation rate for Chinese-to-English and English-to-Chinese translations, respectively.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134403396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tone recognition for Chinese speech: a comparative study of Mandarin and Cantonese","authors":"Gang Peng, Hongying Zheng, William S-Y. Wang","doi":"10.1109/CHINSL.2004.1409629","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409629","url":null,"abstract":"The paper presents a comparative study on automatic continuous tone recognition for Mandarin and Cantonese. Compared with Mandarin, Cantonese has a much more complex tone system. The effects of F/sub 0/ normalization on the tone recognition of Mandarin and Cantonese are studied. Furthermore, the two tone systems are compared from an engineering point of view. Tone recognition accuracies of 71.50% and 83.06% have been obtained for Cantonese and Mandarin respectively. These results compare favorably with results reported for other tone recognition experiments on the same (for Cantonese) and similar (for Mandarin) databases.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115330671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discriminative transform for confidence estimation in Mandarin speech recognition","authors":"Gang Guo, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409638","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409638","url":null,"abstract":"In automatic speech recognition (ASR) applications, log likelihood ratio testing (LRT) is one of the most popular techniques to obtain a confidence measure (CM). Unlike the traditional (log likelihood ratio) LLR related method, we apply nonlinear transformations towards LLR before computing string-level CM. Different phonemes may have different transformation functions. Through suitable LLR transformations, the verification performance of those string-level CM may increase. Transformation functions are implemented by a multilayer perceptron (MLP). Two algorithms are used to optimize the parameters of the MLP: one is the minimum verification error (MVE) algorithm; another is the figure-of-merit (FOM) training algorithm. In our Mandarin command recognition system, the two methods remarkably improve the performance of confidence measures for out-of-vocabulary word rejection compared with the performance of standard LRT related CM, and we obtain a best 45.5% relative reduction in equal error rate (EER). In addition, in our Mandarin command recognition experiments, the FOM training algorithm outperforms the MVE algorithm even they share an approximately same best performance, while due to limited experimental setups in our experiments, which algorithm is the better still needs to be explored.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114509451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum classification error rate pattern recognition approach for speech and language processing","authors":"W. Chou","doi":"10.1109/CHINSL.2004.1409568","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409568","url":null,"abstract":"Summary form only given. Minimum classification error (MCE) rate pattern recognition approach is a fast moving research area and broadly applied to pattern recognition problems in speech and language processing. We give an overview of the basic MCE classifier design algorithms as well as the more advanced extensions of the MCE approach. We differentiate the classifier design by way of distribution estimation and by way of the discriminant function methods according to the minimum classification error rate paradigm. We study the practical issues in system implementation and highlight the application perspectives of applying MCE classifier design to practical speech and language processing systems.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123891015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantization of SEW and REW magnitude for 2 kb/s waveform interpolation speech coding","authors":"Jing Li, C. Bao","doi":"10.1109/CHINSL.2004.1409606","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409606","url":null,"abstract":"The paper presents quantization schemes for the magnitude spectra of the slowly evolving waveform (SEW) and rapidly evolving waveform (REW) components in a 2 kb/s waveform interpolation (WI) coder. The SEW magnitude spectrum is quantized using a DCT-based predictive vector quantization approach. The REW magnitude spectrum is quantized using a matrix quantizer based on the combined dimension conversion method. Objective measures and subjective results indicate that the proposed quantization schemes are effective in achieving good quantization accuracy.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123891911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for fast segment model by avoidance of redundant computation on segment","authors":"Yun Tang, Wenju Liu, Yiyan Zhang, Bo Xu","doi":"10.1109/CHINSL.2004.1409600","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409600","url":null,"abstract":"The segment model (SM) is a family of methods using segmental distribution rather than frame-based features (e.g. HMM) to represent the underlying characteristics of the observation sequence. It has been proved to be more precise than that of HMM. However, the high complexity prevents these models' use in practical systems. In this paper we present a framework to reduce the computational complexity of the segment model by fixing the number of the basic unit in the segment to share the intermediate computation results. Our work is twofold. First, we compared the complexity of SM with HMM and proposed a fast SM framework based on the comparison. Second we use two examples to illustrate this framework. The fast SM have better performance than the system based on HMM, and at the mean time, we successfully keep the computational complexity of SM at the same level as HMM.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125880703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}