2004 International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Large vocabulary continuous Mandarin speech recognition using finite state machine 基于有限状态机的大词汇量连续普通话语音识别
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409572
Yi-Cheng Pan, Chia-Hsing Yu, Lin-Shan Lee
{"title":"Large vocabulary continuous Mandarin speech recognition using finite state machine","authors":"Yi-Cheng Pan, Chia-Hsing Yu, Lin-Shan Lee","doi":"10.1109/CHINSL.2004.1409572","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409572","url":null,"abstract":"The finite state transducer (FST), popularly used in the natural language processing (NLP) area to represent the grammar rules and the characteristics of a language, has been extensively used as the core in large vocabulary continuous speech recognition (LVCSR) in recent years. By means of FST, we can effectively compose the acoustic model, pronunciation lexicon, and language model to form a compact search space. In this paper, we present our approach to developing a LVCSR decoder using FST as the core. In addition, the traditional one-pass tree-copy search algorithm is also described for comparison in terms of speed, memory requirements and achieved character accuracy.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"331 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134100626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of Shanghainese F/sub 0/ contours based on the command-response model 基于命令响应模型的上海F/sub /等高线分析
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409591
Wentao Gu, K. Hirose, H. Fujisaki
{"title":"Analysis of Shanghainese F/sub 0/ contours based on the command-response model","authors":"Wentao Gu, K. Hirose, H. Fujisaki","doi":"10.1109/CHINSL.2004.1409591","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409591","url":null,"abstract":"As one of the major Chinese dialects, Shanghainese is well known for its complex tone sandhi system. This paper applies the command-response model to represent F/sub 0/ contours of Shanghainese speech. Analysis-by-synthesis is conducted both on carrier sentences with monosyllabic target words and on isolated polysyllabic words, from which a set of appropriate tone command patterns is derived for words of different lengths and different initial citation tones. By incorporating the effects of tone coarticulation, word accentuation and phrase intonation, the model gives high accuracy of approximations to F/sub 0/ contours of Shanghainese utterances, and hence provides a more efficient means to quantitatively represent F/sub 0/ contours and to describe the tone sandhi system of Shanghainese than the traditional 5-level tone code system.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134223899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hearer model based stress prediction for Chinese TTS system 基于Hearer模型的中国TTS系统应力预测
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409611
Guoping Hu, Qingfeng Liu, Yu Hu, Ren-Hua Wang
{"title":"Hearer model based stress prediction for Chinese TTS system","authors":"Guoping Hu, Qingfeng Liu, Yu Hu, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409611","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409611","url":null,"abstract":"People often feel tired if they listen to synthesized speech for a long time. This is mainly because synthesized speech is too flat and never stresses the focus. Unlike traditional TTS research approaches of speaker simulation, the paper investigates stress prediction from the point of view of the hearer. An ideal hearer model is first proposed to predict the stress distribution based on the following hypothesis: people speak with limited stress effort and distribute the limited effort to ensure that the hearer can understand the speaker easily. Then, according to the limited research resource, we modify the ideal hearer model and present a practical model. Experiments show that the stress prediction achieves an acceptable rate of 87.36%.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134469165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emotion recognition from Mandarin speech signals 基于普通话语音信号的情感识别
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409646
T. Pao, Yu-Te Chen, Jun-Heng Yeh
{"title":"Emotion recognition from Mandarin speech signals","authors":"T. Pao, Yu-Te Chen, Jun-Heng Yeh","doi":"10.1109/CHINSL.2004.1409646","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409646","url":null,"abstract":"In this paper, a Mandarin speech based emotion classification method is presented. Five primary human emotions including anger, boredom, happiness, neutral and sadness are investigated. In emotion classification of speech signals, the conventional features are statistics of fundamental frequency, loudness, duration and voice quality. However, the recognition accuracy of systems employing these features degrades substantially when more than two valence emotion categories are invoked. For speech emotion recognition, we select 16 LPC coefficients, 12 LPCC components, 16 LFPC components, 16 PLP coefficients, 20 MFCC components and jitter as the basic features to form the feature vector. A Mandarin corpus recorded by 12 non-professional speakers is employed. The recognizer presented in this paper is based on three recognition techniques: LDA, K-NN, and HMMs. Experimental results show that the selected features are robust and effective for emotion recognition, not only in the arousal dimension but also in the valence dimension.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116739197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
An investigation into subspace rapid speaker adaptation 子空间快速说话人自适应研究
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409639
Michael Zhang, Jun Xu
{"title":"An investigation into subspace rapid speaker adaptation","authors":"Michael Zhang, Jun Xu","doi":"10.1109/CHINSL.2004.1409639","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409639","url":null,"abstract":"Speaker adaptation is an essential part of any state-of-the-art automatic speech recognizer (ASR). Recently, more and more application requirements have appeared for embedded ASR. For these cases, a more compact speech model, subspace distribution clustering hidden Markov model (SDCHMM) is used instead of continuous density hidden Markov model (CDHMM). In previous studies on SDCHMM adaptation, the subspace Gaussian pools of SDCHMM are the parameters to be adjusted for speaker variations. Alternatively, we try to employ the link table parameters of SDCHMM, which defines the tying structure in subspaces, to model the inter-speaker mismatch, with the Gaussian parameters maintained. Since the variation range for the parameters is highly limited, this method is potentially faster than conventional Gaussian pools adaptation. A comparative study on a continuous digital dialing (CDD) task shows that when data is seriously insufficient, link table adaptation is more effective than conventional methods, with 17% relative improvement in utterance accuracy rate, compared to 14% improvement by previous Gaussian adaptation. However, further improvement with more data is limited. When data size is doubled, this method gave 21% improvement, compared to 30% improvement by the conventional method.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133254460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An information gain and grammar complexity based approach to attribute selection in speech enabled information retrieval dialogs 基于信息增益和语法复杂度的语音信息检索对话框属性选择方法
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409657
Haiping Li, Haixin Chai
{"title":"An information gain and grammar complexity based approach to attribute selection in speech enabled information retrieval dialogs","authors":"Haiping Li, Haixin Chai","doi":"10.1109/CHINSL.2004.1409657","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409657","url":null,"abstract":"An effective dialog driven method is required for today's speech enabled information retrieval systems, such as name dialers. Similar to the dynamic sales dialog for electronic commerce scenarios, information gain measure based approaches are widely used for attribute selection and dialog length reduction. However, for speech enabled information retrieval systems, another important factor influencing attribute selection is speech recognition accuracy. Too low accuracy results in a failed dialog. Recognition accuracy varies with many issues, including acoustic model performance and grammar complexity. The acoustic model is fixed for a whole dialog, while grammar is different for each interaction round, thereby grammar complexity influences the attribute selected for the next question. An approach combining both information gain measurement and grammar complexity is presented for a dynamic dialog driven system. Offline evaluations show that this approach can give a trade-off between the target of faster discrimination of the candidates for retrieval and higher recognition accuracy.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependence of correct pronunciation of Chinese aspirated sounds on power during voice onset time 汉语送气音的正确发音对发音时间的力量依赖性
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409601
A. Hoshino, Akio Yasuda
{"title":"Dependence of correct pronunciation of Chinese aspirated sounds on power during voice onset time","authors":"A. Hoshino, Akio Yasuda","doi":"10.1109/CHINSL.2004.1409601","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409601","url":null,"abstract":"The length of voice onset time (VOT) in uttering Chinese aspirated sounds, which are difficult for Japanese to pronounce, is an important factor in evaluating the quality of pronunciation. In this paper, both the length of the VOT and the power used during the VOT for 21 single-vowel syllables of six different Chinese aspirates were measured for 40 Japanese students and nine native speakers of Chinese. The quality of the students' pronunciation was evaluated using a hearing test judged by eight native Chinese. The results indicated that the correlation between the quality of the students' pronunciation and the power used in uttering a sound was greater than to the VOT within a certain range of VOT which varied for different syllables. Thus, we conclude that power is also an important factor in evaluating the quality of pronunciation.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129111723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Energy contour enhancement for noisy speech recognition 噪声语音识别的能量轮廓增强
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409633
Tai-Hwei Hwang, Sen-Chia Chang
{"title":"Energy contour enhancement for noisy speech recognition","authors":"Tai-Hwei Hwang, Sen-Chia Chang","doi":"10.1109/CHINSL.2004.1409633","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409633","url":null,"abstract":"Environmental noise, known as an additive noise, not only corrupts the spectra of a speech signal but also blurs the shape of its energy contour. The corruption of the energy contour can distort the energy derived feature and degrade the pattern classification performance of noisy speech. To reduce the distortion of the energy feature, the energy bias in the energy contour has to be removed before the feature extraction. For this purpose, we propose two methods to estimate the noise energy; one is obtained from the speech inactive period, and one is from the noisy speech itself. The methods are evaluated by the connected digit recognition of TIDigits, in which the test speech is corrupted with white noise, babble, factory noise, and in-car noises. As shown in the experiments, the energy enhancement can provide an additional improvement when it is jointly applied with a spectral subtraction.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129937891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Chinese-English mixed-lingual keyword spotting 中英文混合关键词识别
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409630
Shan-Ruei You, Shih-Chieh Chien, Chih-Hsing Hsu, Ke-Shiu Chen, Jia-Jang Tu, Jeng-Shien Lin, Sen-Chia Chang
{"title":"Chinese-English mixed-lingual keyword spotting","authors":"Shan-Ruei You, Shih-Chieh Chien, Chih-Hsing Hsu, Ke-Shiu Chen, Jia-Jang Tu, Jeng-Shien Lin, Sen-Chia Chang","doi":"10.1109/CHINSL.2004.1409630","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409630","url":null,"abstract":"Base on our former experience in the \"ITRI 104 Auto Attendant System\" of using keyword spotting for Mandarin speech recognition (W.-C. Shieh et al., CCL Technical Journal, vol. 96), a Chinese-English mixed-lingual keyword spotting system, which caters for the Taiwanese speaking style, is presented. Detailed descriptions and discussions for developing the mixed-lingual auto attendant system are included, especially for solving different scoring scales in the decoding phase and the re-scoring phase for the two languages. In the decoding phase, we propose a bias-compensation method to make up the score-gap in the likelihood calculation of using Chinese and English acoustic models. To select the most probable result from the recognized hypotheses, a method is also presented of normalizing the combination scores when using different scoring mechanisms in the re-scoring phase.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"8 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117047038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MCE-based training of subspace distribution clustering HMM 基于mce的子空间分布聚类HMM训练
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409599
Xiao-Bing Li, Lirong Dai, Ren-Hua Wang
{"title":"MCE-based training of subspace distribution clustering HMM","authors":"Xiao-Bing Li, Lirong Dai, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409599","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409599","url":null,"abstract":"For resource-limited platforms, the subspace distribution clustering hidden Markov model (SDCHMM) is better than the continuous density hidden Markov model (CDHMM) for its smaller storage and lower computations while maintaining a decent recognition performance. But the normal SDCHMM obtaining method does not ensure optimality in classifier design. In order to obtain an optimal classifier, a new SDCHMM training algorithm that adjusts the parameters of SDCHMM according to the minimum classification error (MCE) criterion is proposed in this paper. Our experimental results on TiDigits and RM tasks show the MCE-based SDCHMM training algorithm provides 15-80% word error rate reduction (WERR) compared with the normal SDCHMM that is converted from CDHMM.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信