2004 International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Double Gaussian based feature normalization for robust speech recognition 基于双高斯特征归一化的鲁棒语音识别
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409634
Bo Liu, Lirong Dai, Jinyu Li, Ren-Hua Wang
{"title":"Double Gaussian based feature normalization for robust speech recognition","authors":"Bo Liu, Lirong Dai, Jinyu Li, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409634","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409634","url":null,"abstract":"In this paper, a new feature normalization approach, based on the cumulative density function (CDF) matching principle, is proposed. Since speech features in noisy environments usually follow bimodal distributions, we fully utilize this characteristic by representing the CDF of the features with a double Gaussian model. A feature normalization process is performed according to the estimated CDF. The experimental results on the Aurora2 database show that the performance of our method is much better than that of the conventional mean and variance normalization (MVN) method, and comparable to that of the method combining spectral subtraction and histogram equalization (HE). Moreover, further improvement has been gained by combining our method with a simple temporal feature smoothing process. This result suggests that our new method has the potential to be integrated with other techniques to provide even better performance.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129413922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Bilingual response generation using semi-automatically-induced templates for a mixed-initiative dialog system 混合主动对话系统中使用半自动诱导模板的双语响应生成
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409619
Wing Lin Yip, H. Meng
{"title":"Bilingual response generation using semi-automatically-induced templates for a mixed-initiative dialog system","authors":"Wing Lin Yip, H. Meng","doi":"10.1109/CHINSL.2004.1409619","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409619","url":null,"abstract":"We have previously developed a framework for natural language response generation for mixed-initiative dialogs in the CUHK Restaurants domain. This paper investigates the use of semi-automatic technique for response templates generation. We adopt a semi-automatic approach for grammar induction to capture the language structures of responses from unannotated corpora. We wish to use this approach to induce a set of grammars from our response data. The induced grammars are coupled with a parser to produce response templates in a semi-automatic way. Our response data consists of 2349 waiter responses. It is used as the training corpus for grammar induction. Unsupervised grammar induction is first performed, followed by using the learned grammars as prior knowledge for seeding the clustering process. Results show that the semi-automatically-induced response templates cover more than 50% of the hand-designed templates in template coverage and provide more realization options. Performance evaluation indicates that the task completion rate is at least 90%, and most of the Grice's maxims as well as the overall user satisfaction scored at 3.5 points or above.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"85 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of language boundary in code-switching utterances by bi-phone probabilities 用双电话概率检测语码转换话语的语言边界
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409644
Joyce Y. C. Chan, P. Ching, Tan Lee, H. Meng
{"title":"Detection of language boundary in code-switching utterances by bi-phone probabilities","authors":"Joyce Y. C. Chan, P. Ching, Tan Lee, H. Meng","doi":"10.1109/CHINSL.2004.1409644","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409644","url":null,"abstract":"In this paper, we present an effective method to detect the language boundary (LB) in code-switching utterances. The utterances are mainly produced in Cantonese, a commonly used Chinese dialect, whilst occasionally English words are inserted between Cantonese words. Bi-phone probabilities are calculated to measure the confidence that the recognized phones are in Cantonese. Two sets of context-independent mono-phone models are trained by monolingual Cantonese and monolingual English data separately. Both knowledge-based and data-driven model selection approaches are studied in order to retain the language-dependent characteristics and to merge duplicated phone sets between the two languages. The LB detection accuracy is 75.12% for utterances that contain one single code-switching word or phrase.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121492656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
應用語料庫和語意相依法則於中文語音文件之摘要 (Spoken Document Summarization Using Topic-Related Corpus and Semantic Dependency Grammar) [In Chinese] 应用语料库和语意相依法则于中文语音文件之摘要 (Spoken Document Summarization Using Topic-Related Corpus and Semantic Dependency Grammar) [In Chinese]
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409654
Chia-Hsin Hsieh, Chien-Lin Huang, Chung-Hsien Wu
{"title":"應用語料庫和語意相依法則於中文語音文件之摘要 (Spoken Document Summarization Using Topic-Related Corpus and Semantic Dependency Grammar) [In Chinese]","authors":"Chia-Hsin Hsieh, Chien-Lin Huang, Chung-Hsien Wu","doi":"10.1109/CHINSL.2004.1409654","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409654","url":null,"abstract":"The paper presents a spoken document summarization scheme using a topic-related corpus and semantic dependency grammar. The summarization score considers speech recognition confidence, word significance, word trigram, semantic dependency grammar (SDG) and probabilistic context free grammar (PCFG). In addition, a topic-related corpus consisting of keywords as well as articles is used to estimate the word significance score using latent semantic indexing (LSI). Semantic relations between words are determined by SDG using HowNet and Sinica Treebank. A dynamic programming algorithm is applied to decide the summarization ratio and look for the best summarization result according to summarization scores. Experimental results indicate that the proposed approach effectively extracts important words with semantic dependency and gives a promising speech summary.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121796845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Statistical language model adaptation for Mandarin broadcast news transcription 统计语言模型在普通话广播新闻抄写中的适配
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-01 DOI: 10.1109/CHINSL.2004.1409649
Berlin Chen, Wen-Hung Tsai, Jen-wei Kuo
{"title":"Statistical language model adaptation for Mandarin broadcast news transcription","authors":"Berlin Chen, Wen-Hung Tsai, Jen-wei Kuo","doi":"10.1109/CHINSL.2004.1409649","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409649","url":null,"abstract":"This paper investigates statistical language model adaptation for Mandarin broadcast news transcription. A topical mixture model was proposed to explore the long-span latent topical information for dynamic language model adaptation. The underlying characteristics and various kinds of model complexities were extensively investigated, while their performance was verified by comparison with the conventional MAP-based adaptation approaches, which are devoted to extracting the short-span n-gram information. Speech recognition experiments were conducted on the broadcast news collected in Taiwan. Very promising results in both perplexity and word error rate reductions were initially obtained.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127799127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A study on Mandarin broadcast news speech recognition 普通话广播新闻语音识别研究
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-01 DOI: 10.1109/CHINSL.2004.1409635
C. L. Chen, Yih-Ru Wang, Sin-Horng Chen
{"title":"A study on Mandarin broadcast news speech recognition","authors":"C. L. Chen, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/CHINSL.2004.1409635","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409635","url":null,"abstract":"In this paper, a basic Mandarin broadcast news speech recognition system is constructed using the MATBN database. It considers the acoustic modeling for Mandarin base-syllables, particles, and paralinguistic phenomena. It also considers environment-dependent acoustic modeling for three recording environments: studio anchors, outdoor reporters, and outdoor interviewees. Moreover, it incorporates a bigram language model with adaptation, using the data in MATBN. Syllable recognition rates of 89.64, 84.42 and 61.62% were achieved for the three environments of anchors, reporters and interviewees, respectively.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114435638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A maximum entropy approach for integrating semantic information in statistical language models 统计语言模型中语义信息集成的最大熵方法
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-01 DOI: 10.1109/CHINSL.2004.1409648
C. Chueh, Jen-Tzung Chien, H. Wang
{"title":"A maximum entropy approach for integrating semantic information in statistical language models","authors":"C. Chueh, Jen-Tzung Chien, H. Wang","doi":"10.1109/CHINSL.2004.1409648","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409648","url":null,"abstract":"In this paper, we propose an adaptive statistical language model, which successfully incorporates the semantic information into an n-gram model. Traditional n-gram models exploit only the immediate context of history. We first introduce the semantic topic as a new source to extract the long distance information for language modeling, and then adopt the maximum entropy (ME) approach instead of the conventional linear interpolation method to integrate the semantic information with the n-gram model. Using the ME approach, each information source gives rise to a set of constraints, which should be satisfied to achieve the hybrid model. In the experiments, the ME language models, trained using the China Times newswire corpus, achieved 40% perplexity reduction over the baseline bigram model.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Investigation and modeling of coarticulation in speech production 语音生成中协同发音的研究与建模
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-01 DOI: 10.1109/CHINSL.2004.1409577
J. Dang, Jianguo Wei, T. Suzuki, K. Honda, P. Perrier, M. Honda
{"title":"Investigation and modeling of coarticulation in speech production","authors":"J. Dang, Jianguo Wei, T. Suzuki, K. Honda, P. Perrier, M. Honda","doi":"10.1109/CHINSL.2004.1409577","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409577","url":null,"abstract":"Coarticulation during speech production takes place at both the physiological level concerned with articulators' properties and the planning stage for elaborating motor commands. This study focuses on the investigation and modeling of planning aspects of coarticulation with the ultimate objective to implement human mechanism in controlling a physiological speech production model. A \"carrier model\" was derived from articulatory data to describe mechanisms of the coarticulation, in which the vocalic movement is considered to be the primary component as a \"carrier wave\" and the consonantal movement as a \"modulation wave\". Interactions between the carrier and modulation waves were evaluated using phoneme sequences of V/sub b/CV/sub c/CV/sub b/ (V/sub c/: the central vowel; V/sub b/: the bilateral vowel; C: consonants) out of articulatory data that was obtained from the electromagnetic articulographic system. The analysis of the articulatory data showed that the articulatory position of the central vowel tended to be assimilated towards that of the bilateral vowels.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115015500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Contributions of periodicity fluctuation cues in individual frequency channels to Chinese speech recognition 各频率通道周期性波动线索对汉语语音识别的贡献
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-01 DOI: 10.1109/CHINSL.2004.1409604
Xin Luo, Q. Fu
{"title":"Contributions of periodicity fluctuation cues in individual frequency channels to Chinese speech recognition","authors":"Xin Luo, Q. Fu","doi":"10.1109/CHINSL.2004.1409604","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409604","url":null,"abstract":"Studies have revealed near perfect speech recognition with primarily temporal envelope cues and severely degraded spectral cues. Among different types of temporal envelope cues, periodicity fluctuation cues have been found to significantly improve Chinese tone and sentence recognition, while the contributions of periodicity fluctuation cues in individual frequency channels to Chinese speech recognition have not been clearly stated. In order to make periodicity fluctuation cues available in different frequency regions, the present study employed different low-pass cutoff frequencies for the temporal envelope detectors in different channels of a four-channel noise-band cochlear implant simulation. Chinese tone and vowel recognition scores were measured for six native Chinese normal hearing listeners under six low-pass cutoff frequency combinations: all 50 Hz in four channels (all-50 Hz), all 500 Hz in four channels (all-500 Hz), and 500 Hz in one of the four channels while 50 Hz in the other three channels (ch1-500 Hz, ch2-500 Hz, et al.). The results showed that the ch4-500 Hz condition produced the highest Chinese tone recognition among the four single-channel-500 Hz conditions, and was the only condition whose tone recognition was similar to that of the all-500 Hz condition and was significantly higher than that of the all-50 Hz condition. Chinese vowel recognition was not significantly affected by different cutoff frequency combinations. These results suggest that delivering periodicity fluctuation cues in higher frequency channels might be more important and efficient in enhancing Chinese tone recognition for cochlear implant patients.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131366669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信