2008 6th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
A Three-Stage Text Normalization Strategy for Mandarin Text-to-Speech Systems 中文文本-语音系统的三阶段文本规范化策略
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.43
Tao Zhou, Yuan Dong, Dezhi Huang, Wu Liu, Haila Wang
{"title":"A Three-Stage Text Normalization Strategy for Mandarin Text-to-Speech Systems","authors":"Tao Zhou, Yuan Dong, Dezhi Huang, Wu Liu, Haila Wang","doi":"10.1109/CHINSL.2008.ECP.43","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.43","url":null,"abstract":"Text normalization is an important component in mandarin Text-to-Speech system. This paper develops a taxonomy of Non-Standard Words (NSW's) based on a Large-scale Chinese corpus and proposes a three-stage text normalization strategy: Finite State Automata (FSA) for initial classification, Maximum Entropy (ME) Classifier & Rules for further classification and General Rules for standard word conversion. The three-stage approach achieves Precision of 96.02% in experiments, 5.21% higher than that of simple rule based approach and 2.21% higher than that of simple machine learning method. Experiments results show that the approach of three-stage disambiguation strategy for text normalization makes considerable improvement, and works well in real TTS system.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115775538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Automatic Prosody Boundary Labeling of Mandarin Using Both Text and Acoustic Information 基于文本和声学信息的汉语韵律边界自动标注
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.100
Chongjia Ni, Wenju Liu, Bo Xu
{"title":"Automatic Prosody Boundary Labeling of Mandarin Using Both Text and Acoustic Information","authors":"Chongjia Ni, Wenju Liu, Bo Xu","doi":"10.1109/CHINSL.2008.ECP.100","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.100","url":null,"abstract":"Prosody is an important factor for a high quality text-to- speech (TTS) system. Prosody is often described with a hierarchical structure. So the generation of the hierarchical prosody structure is very important both in the corpus building and the real-time text analysis, but the prosody labeling procedure is laborious and time consuming. In this paper, an automatic prosody boundary label system is presented, in which the classification and regression tree (CART) framework is used. In this system, we build a prosody model using acoustic information and the text information based on large speech corpus with prosodic structure label (ASCCD). Experiments show this model can achieve prosody boundary detection 90.86% accuracy.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124274445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Discriminative Output Coding Features for Speech Recognition 语音识别的判别输出编码特征
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.34
O. Dehzangi, B. Ma, Chng Eng Siong, Haizhou Li
{"title":"Discriminative Output Coding Features for Speech Recognition","authors":"O. Dehzangi, B. Ma, Chng Eng Siong, Haizhou Li","doi":"10.1109/CHINSL.2008.ECP.34","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.34","url":null,"abstract":"This paper presents a novel approach of discriminative acoustic feature extraction for speech recognition using output coding technique. A high dimensional feature space for higher discriminative capability is constructed by expanding MFCC coefficients with polynomial expansion. In order to fit the discriminative features in the hidden Markov model structure of speech recognition, the high dimensional feature vectors are further projected into a low dimensional feature space using the output scores of a set of SVMs. Each of the SVMs is trained in one phone versus the rest manner so that each of the resulting feature dimensions can provide effective information to differ one phone from the others. The discriminative features have been evaluated in the speech recognition task of the TIMIT corpus, and 72.18% phone accuracy has been achieved.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121216925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation and Analysis of Minimum Phone Error Training and its Modified Versions for Large Vocabulary Mandarin Speech Recognition 面向大词汇量普通话语音识别的最小电话错误训练及其修正版本的评价与分析
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.51
Yung-Jen Cheng, Che-Kuang Lin, Lin-Shan Lee
{"title":"Evaluation and Analysis of Minimum Phone Error Training and its Modified Versions for Large Vocabulary Mandarin Speech Recognition","authors":"Yung-Jen Cheng, Che-Kuang Lin, Lin-Shan Lee","doi":"10.1109/CHINSL.2008.ECP.51","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.51","url":null,"abstract":"This paper reports a detailed study on minimum phone error (MPE), minimum phone frame error (MPFE), and a physical-state level version of minimum Bayes risk (sMBR) training, as well as several modified versions of them, for transcription of large vocabulary Mandarin broadcast news. We found the results are quite different from these observed previously for English and Arabic broadcast news tasks[l], in particular the trends are different when different performance measures (word and character accuracies) are used. This makes the difference for Chinese language, for which character accuracy is usually more important, while word accuracy is commonly used for other languages. Modifications to these approaches tested here include considering the variable phone length and applying penalties to erroneous frames. They were shown to be able to significantly improve character accuracy in our experiments.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122850852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems TTs转换系统的同步声学、韵律和措辞模型训练
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.12
Keiichiro Oura, Yoshihiko Nankaku, T. Toda, K. Tokuda, R. Maia, S. Sakai, Satoshi Nakamura
{"title":"Simultaneous Acoustic, Prosodic, and Phrasing Model Training for TTs Conversion Systems","authors":"Keiichiro Oura, Yoshihiko Nankaku, T. Toda, K. Tokuda, R. Maia, S. Sakai, Satoshi Nakamura","doi":"10.1109/CHINSL.2008.ECP.12","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.12","url":null,"abstract":"A new integrated model for simultaneous modeling of linguistic and acoustic models, and a training algorithm is proposed. Usually, text-to-speech (TTS) systems based on the hidden Markov model (HMM) consist of text analysis and speech synthesis modules. Linguistic and acoustic model training are performed independently using different training data sets. Integrated model parameters were simultaneously optimized by the proposed training algorithm. The derived algorithm optimizes two model parameter sets simultaneously. Therefore, the appropriate model is estimated because we can directly-formulate the TTS problem in which the speech waveform is generated from a word sequence. We conducted objective evaluation experiments using phrasing and prosodic models to evaluate the effectiveness of the proposed technique.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mandarin Language Understanding in Dialogue Context 对话语境中的普通话理解
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.40
Yushi Xu, Jingjing Liu, S. Seneff
{"title":"Mandarin Language Understanding in Dialogue Context","authors":"Yushi Xu, Jingjing Liu, S. Seneff","doi":"10.1109/CHINSL.2008.ECP.40","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.40","url":null,"abstract":"In this paper we introduce Mandarin language understanding methods developed for spoken language applications. We describe a set of strategies to improve the parsing performance for Mandarin. We also discuss two context resolution techniques adopted to handle Chinese ellipsis in a practical Mandarin spoken dialogue system. Experimental evaluation verifies the effectiveness and efficiency of our proposed parsing enhancements, in terms of both parse coverage and speed. System evaluation with human subjects also verifies the effectiveness of our proposed approaches to speech understanding and context resolution in practical conversational systems.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130701635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Position Information for Language Modeling in Speech Recognition 语音识别中语言建模的位置信息
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.37
Hsuan-Sheng Chiu, Guan-Yu Chen, Chun-Jen Lee, Berlin Chen
{"title":"Position Information for Language Modeling in Speech Recognition","authors":"Hsuan-Sheng Chiu, Guan-Yu Chen, Chun-Jen Lee, Berlin Chen","doi":"10.1109/CHINSL.2008.ECP.37","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.37","url":null,"abstract":"This paper considers word position information for language modeling. For organized documents, such as technical papers or news reports, the composition and the word usage of articles of the same style are usually similar. Therefore, the documents can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, e.g., introductory remarks, related studies or events, elucidations of methodology or affairs, conclusions of the articles, and references, or footnotes of reporters. In this paper, we explore word position information and then propose two position- dependent language models for speech recognition. The structures and characteristics of these position-dependent language models were extensively investigated, while its performance was analyzed and verified by comparing it with the existing n-gram, mixture- and topic-based language models. The large vocabulary continuous speech recognition (LVCSR) experiments were conducted on the broadcast news transcription task. The preliminary results seem to indicate that the proposed position-dependent models are comparable to the mixture- and topic-based models.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129932015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Double Gauss Based Unsupervised Score Normalization in Speaker Verification 基于双高斯的无监督评分归一化在说话人验证中的应用
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.53
Wu Guo, Lirong Dai, Ren-Hua Wang
{"title":"Double Gauss Based Unsupervised Score Normalization in Speaker Verification","authors":"Wu Guo, Lirong Dai, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.53","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.53","url":null,"abstract":"In text-independent speaker verification, unsupervised mode can improve system performance. In traditional systems, the speaker model is updated when a test speech has a score higher than a particular threshold; we call this unsupervised model training. In this paper, an unsupervised score normalization is proposed. A target speaker score Gauss and an impostor score Gauss are set up as a prior; the parameters of the impostor score model are updated using the test score. Then the test score is normalized by the new impostor score model. When the unsupervised score normalization, unsupervised model training and factor analysis are adopted in the NIST 2006 SRE core test, the EER of the system is 4.29%.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127740100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Prosodic Variation in Cantonese-English Code-Mixed Speech 粤语-英语语码混合语音的韵律变化
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.97
Wentao Gu, Tan Lee, P. Ching
{"title":"Prosodic Variation in Cantonese-English Code-Mixed Speech","authors":"Wentao Gu, Tan Lee, P. Ching","doi":"10.1109/CHINSL.2008.ECP.97","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.97","url":null,"abstract":"This study investigates the prosodic features of Cantonese-English code-mixed speech. It is found that the prosody of the matrix language is hardly altered, while the prosody of the embedded language is assimilated to that of the matrix language. That is, the rhythmic pattern is shifted towards syllable-timing, whereas the variations in the F0 pattern are mainly in the word-final syllable: for a stressed syllable the F0 contour turns flat, while for a post-tonic unstressed syllable the F0 contour falls more steeply than in monolingual English speech. Such F0 variations can be explained by the phonological interaction of English lexical stress and Cantonese lexical tone. In addition, the F0 of the embedded English word tends to become higher due to the embedding effect.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130158999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Utilization of Huge Written Text Corpora for Conversational Speech Recognition 大型书面语料库在会话语音识别中的应用
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.36
Xinhui Hu, H. Yamamoto, Jin-Song Zhang, K. Yasuda, Youzheng Wu, H. Kashioka
{"title":"Utilization of Huge Written Text Corpora for Conversational Speech Recognition","authors":"Xinhui Hu, H. Yamamoto, Jin-Song Zhang, K. Yasuda, Youzheng Wu, H. Kashioka","doi":"10.1109/CHINSL.2008.ECP.36","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.36","url":null,"abstract":"In this paper, we propose a new sentence selection method using large written text corpora to augment the language model of conversational speech recognition in order to resolve the insufficiency of in-domain training data coverage in conversational speech recognition. In the proposed method, the large written text corpora are clustered by an entropy-based method. Clusters similar to the target development set are selected automatically. Next, utterances are selected and mixed with the original conversational training corpus, and language models for conversational speech recognition are built. In our experiments, a different speech style test set that is not covered by original conversational training data is used for evaluation. The perplexity of the test set was reduced from 249.6 to 210.8, and the word recognition accuracy was improved by approximately 5% by using our method. Index Terms: data collection, training data coverage, language model, conversational speech recognition.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124954956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信