2004 International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Text-independent speaker verification based on relation of MFCC components 基于MFCC成分关系的文本无关说话人验证
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409585
G. Ou, Dengfeng Ke
{"title":"Text-independent speaker verification based on relation of MFCC components","authors":"G. Ou, Dengfeng Ke","doi":"10.1109/CHINSL.2004.1409585","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409585","url":null,"abstract":"GMM is prevalent for speaker verification. It performs very well but needs a background model to give a reference value, which greatly influences the error rate. In order to get a better generalization result, a large database with lots of people is needed to train the background model. In this paper, a new method without background model is proposed, which is called the correlation and kernel function method (CK method). In the CK method, the correlation and uncorrelation of MFCC are used to identify individuals, and a kernel function is used to work out the likelihood of two models. It works more than 30 times as fast as GMM method does, but requires fewer data to train and less space to store the model. But its performance is nearly identical to that of GMM. So it is suitable for real-time computation.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116269456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Maximum entropy modeling for speech recognition 语音识别的最大熵建模
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409569
H. Kuo
{"title":"Maximum entropy modeling for speech recognition","authors":"H. Kuo","doi":"10.1109/CHINSL.2004.1409569","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409569","url":null,"abstract":"Summary form only given. Maximum entropy (maxent) models have become very popular in natural language processing. We begin with a basic introduction of the maximum entropy principle, cover the popular algorithms for training maxent models, and describe how maxent models have been used in language modeling and (more recently) acoustic modeling for speech recognition. Some comparisons with other discriminative modeling methods is made. A substantial amount of time is devoted to the details of a new framework for acoustic modeling using maximum entropy direct models, including practical issues of implementation and usage. Traditional statistical models for speech recognition have all been based on a Bayesian framework using generative models such as hidden Markov models (HMM). The new framework is based on maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMM, features can be asynchronous and overlapping, and need not be statistically independent. This model therefore allows for the potential combination of many different types of features. Results from a specific kind of direct model, the maximum entropy Markov model (MEMM) are presented. Even with conventional acoustic features, the approach already shows promising results for phone level decoding. The MEMM significantly outperforms traditional HMM in word error rate when used as stand-alone acoustic models. Combining the MEMM scores with HMM and language model scores shows modest improvements over the best HMM speech recognizer. We give a sense of some exciting possibilities for future research in using maximum entropy models for acoustic modeling.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117148030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Perception of Mandarin intonation 普通话语调的感知
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409582
Jiahong Yuan
{"title":"Perception of Mandarin intonation","authors":"Jiahong Yuan","doi":"10.1109/CHINSL.2004.1409582","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409582","url":null,"abstract":"This study investigates how tone and intonation, and how focus and intonation, interact in intonation type (statement versus question) identification. A perception experiment was conducted on a speech corpus of 1040 utterances. Sixteen listeners participated in the experiment. The results reveal three asymmetries: statement and question intonation identification; effects of the sentence-final Tone2 and Tone4 on question intonation identification; and effects of the final focus on statement and question intonation identification. These asymmetries suggest that: (1) statement intonation is a default or unmarked intonation type whereas question intonation is a marked intonation type; (2) question intonation has a higher prosodic strength at the sentence final position; (3) there is a tone-dependent mechanism of question intonation at the sentence-final position.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124065785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Predicting prosodic words from lexical words - a first step towards predicting prosody from text 从词汇词中预测韵律词——从文本中预测韵律的第一步
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409614
Hua-Jui Peng, Chi-ching Chen, Chiu-yu Tseng, Keh-Jiann Chen
{"title":"Predicting prosodic words from lexical words - a first step towards predicting prosody from text","authors":"Hua-Jui Peng, Chi-ching Chen, Chiu-yu Tseng, Keh-Jiann Chen","doi":"10.1109/CHINSL.2004.1409614","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409614","url":null,"abstract":"Much remains unsolved in how to predict prosody from text for unlimited Mandarin Chinese TTS. The interactions and the rules between syntactic structure and prosodic structure are still unresolved challenges. By using part-of-speech (POS) tagging, for which text lexical information is required, we aim to find significant patterns of word grouping from analyzing real speech data and such lexical information. The paper reports discrepancies found between lexical words (LW) parsed from text and prosodic words (PW) annotated from speech data, and proposes a statistical model to predict PWs from LWs. In the statistical model, the length of the word and the tagging from POS are two essential features to predict PWs, and the results show approximately 90% of prediction for PWs; however, it does leave more room for extension. We believe that evidence from PW predictions is a first step towards building prosody models from text.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133149586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Apply length distribution model to intonational phrase prediction 将长度分布模型应用于语调短语预测
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409624
Jianfeng Li, Guoping Hu, Ming Fan, Lirong Dai
{"title":"Apply length distribution model to intonational phrase prediction","authors":"Jianfeng Li, Guoping Hu, Ming Fan, Lirong Dai","doi":"10.1109/CHINSL.2004.1409624","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409624","url":null,"abstract":"A length distribution model for intonational phrase prediction is proposed. This model presents the probability that a certain length sentence is divided into some certain length intonational phrases. We discuss how to estimate the probabilities in the model from a training corpus, and how to apply it to intonational phrase prediction. We combine this model with a maximum entropy model which implements local context information. Experiment results show that length distribution is valuable information for intonational phrase prediction, and that it is able to make significant extra contribution over the maximum entropy model in terms of average score and unacceptable rate.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123882203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cantonese verbal information verification system using GMM-based anti-model 广东话语音信息验证系统采用基于gmm的反模型
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409645
Chao Qin, Tan Lee
{"title":"Cantonese verbal information verification system using GMM-based anti-model","authors":"Chao Qin, Tan Lee","doi":"10.1109/CHINSL.2004.1409645","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409645","url":null,"abstract":"Verbal information verification (VIV) is one of the approaches for speaker authentication. It is a process in which the spoken utterance of a claimed speaker is verified against the key information in a speaker's registered profile. VIV in English has been extensively studied and there has also been some work on Mandarin VIV. In the paper, we study the VIV for users who speak Cantonese, the most commonly used dialect in Southern China and Hong Kong. We propose a new technique for anti-modeling. It uses context independent Gaussian mixture model (GMM) instead of the conventional hidden Markov model (HMM). Experiments on 50 Cantonese native speakers show that the proposed method provides better separation of verification scores of claimant utterances from that of imposter utterances than the HMM based method. An equal error rate of 0.00% is attained with robust interval up to 15%, which manifests an excellent performance.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121996243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An embedded English synthesis approach based on speech concatenation and smoothing 一种基于语音拼接和平滑的嵌入式英语合成方法
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409610
Guilin Chen, Dongjian Yue, Yiqing Zu, Zhenli Yu
{"title":"An embedded English synthesis approach based on speech concatenation and smoothing","authors":"Guilin Chen, Dongjian Yue, Yiqing Zu, Zhenli Yu","doi":"10.1109/CHINSL.2004.1409610","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409610","url":null,"abstract":"An embedded English synthesis approach based on speech concatenation and smoothing is described. This approach adopts phonetic sub-words as carriers of variable-length units. We define 5-class units to cover all English phonetic phenomena. The corresponding cost function and search procedure based on dynamic programming are addressed in the unit-selection stage. Vocal tract response, pitch value and phase are interpolated and merged at concatenating points for smoothing speech in the synthesis stage. Preliminary tests show that this approach can reach a good balance of naturalness, intelligibility and data footprint.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Prosody and style controls in CU VOCAL using SSML and SAPI XML tags 使用SSML和SAPI XML标记的cuvocal中的韵律和样式控件
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409623
T. Fung, Yuk-Chi Li, H. Meng, P. Ching
{"title":"Prosody and style controls in CU VOCAL using SSML and SAPI XML tags","authors":"T. Fung, Yuk-Chi Li, H. Meng, P. Ching","doi":"10.1109/CHINSL.2004.1409623","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409623","url":null,"abstract":"CU VOCAL is a Cantonese text-to-speech (TTS) engine. We use a syllable-based concatenative synthesis approach to generate intelligible and natural synthetic speech in Cantonese. The paper reports on our recent enhancements in CU VOCAL to support user adjustments in prosody and style with the use of the Speech Synthesis Markup Language (SSML) in the input text. CU VOCAL was previously developed as a SAPI-compliant engine to enable easy integration with other applications. The paper also reports on our enhancements in the CU VOCAL SAPI (speech API) engine to support the SAPI 5 XML tags.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123362803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A method of estimating the equal error rate for automatic speaker verification 自动说话人验证等错误率的估计方法
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409642
Jyh-Min Cheng, Hsiao-Chuan Wang
{"title":"A method of estimating the equal error rate for automatic speaker verification","authors":"Jyh-Min Cheng, Hsiao-Chuan Wang","doi":"10.1109/CHINSL.2004.1409642","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409642","url":null,"abstract":"In an automatic speaker verification (ASV) system, the equal error rate (EER) is a measure to evaluate the system performance. Usually it needs a large number of testing samples to calculate the EER. In order to estimate the EER without running the experiments using testing samples, a method of model-based EER estimation which computes likelihood scores directly from client speaker models and imposter models is proposed. However, the distribution of the computed likelihood scores is significantly biased against the distribution of likelihood scores obtained from testing samples. Here we propose a novel idea to manipulate the speaker models of the client speakers and the imposters so that the distribution of the computed likelihood scores is closer to the distribution of likelihood scores obtained from testing samples. Then a more reliable EER can be calculated by the speaker models. The experimental results show that the proposed method can properly estimate the EER.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123402245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Adaptive conditional pronunciation modeling using articulatory features for speaker verification 使用发音特征进行说话者验证的自适应条件发音建模
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409586
Ka-Yee Leung, M. Mak, M. Siu, S. Kung
{"title":"Adaptive conditional pronunciation modeling using articulatory features for speaker verification","authors":"Ka-Yee Leung, M. Mak, M. Siu, S. Kung","doi":"10.1109/CHINSL.2004.1409586","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409586","url":null,"abstract":"This paper proposes an articulatory feature-based conditional pronunciation modeling (AFCPM) technique for speaker verification. The technique models the pronunciation behavior of speakers by creating a link between the actual phones produced by the speakers and the state of articulations during speech production. Speaker models consisting of conditional probabilities of two articulatory classes are adapted from a set of universal background models (UBM) using the MAP adaptation technique. This adaptation approach aims to prevent over-fitting the speaker models when the amount of speaker data is insufficient for a direct estimation. Experimental results show that the adaptation technique can enhance the discriminating power of speaker models by establishing a tighter coupling between speaker models and the UBM. Results also show that fusing the scores derived from an AFCPM-based system and a conventional spectral-based system achieves a significantly lower error rate than that of the individual systems. This suggests that AFCPM and spectral features are complementary to each other.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133639882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信