2004 International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Improving the performance of MGM-based voice conversion by preparing training data method 通过准备训练数据的方法提高基于mgm的语音转换性能
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409616
Guoyu Zuo, Wenju Liu, Xiaogang Ruan
{"title":"Improving the performance of MGM-based voice conversion by preparing training data method","authors":"Guoyu Zuo, Wenju Liu, Xiaogang Ruan","doi":"10.1109/CHINSL.2004.1409616","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409616","url":null,"abstract":"This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral feature representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/U-V) decision scheme for time-alignment is provided to obtain the right data for training the MGM function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods, and V/UV decision strategies, to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134112834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acoustical study on sub-harmonic of glottal source in Mandarin tones 普通话声门源次谐波的声学研究
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409594
Jiangping Kong
{"title":"Acoustical study on sub-harmonic of glottal source in Mandarin tones","authors":"Jiangping Kong","doi":"10.1109/CHINSL.2004.1409594","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409594","url":null,"abstract":"This paper is concerned with the acoustical analysis of the sub-harmonic of the glottal source in Mandarin tones. The methods used in this research are: (1) extracting the glottal source of tones by inverse filtering; (2) analyzing sub-harmonic and spectrum tilt by FFT; (3) simulating the double peak pulse by 4 functions and describing the nature of them in both time and frequency domains. There are 3 conclusions: (1) the double peak pulse produces a sub-harmonic in the glottal source of Mandarin tones; (2) the sub-harmonic influences the spectrum tilt; (3) the double peak pulse can be simulated and modeled mathematically.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117283686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech 一种新的两步支持向量机分类器,用于语音的浊音/静音分类
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409590
Fengyan Qi, C. Bao, Yan Liu
{"title":"A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech","authors":"Fengyan Qi, C. Bao, Yan Liu","doi":"10.1109/CHINSL.2004.1409590","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409590","url":null,"abstract":"In this paper, a novel method for voiced/unvoiced/silence of speech classification using the support vector machine (SVM) is proposed. This classifier can correctly classify speech frames into voiced frame, unvoiced frame and silence frame. The comparison of experimental results show that the proposed method outperforms other traditional methods. The performance of SVM for different kernel functions in the experiment was analyzed and discussed as well.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134575579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
A comparative study on various confidence measures in large vocabulary speech recognition 大词汇量语音识别中各种置信测度的比较研究
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409573
Gang Guo, Chao Huang, Hui Jiang, Ren-Hua Wang
{"title":"A comparative study on various confidence measures in large vocabulary speech recognition","authors":"Gang Guo, Chao Huang, Hui Jiang, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409573","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409573","url":null,"abstract":"In this paper, we have conducted a comparative study on several confidence measures (CM) for large vocabulary speech recognition. Firstly, we propose a novel high-level CM that is based on the inter-word mutual information (MI). Secondly, we experimentally investigate several popular low-level CM, such as word posterior probabilities, N-best counting, likelihood ratio testing (LRT), etc. Finally, we have studied a simple linear interpolation strategy to combine the best low-level CM with the best high-level CM. All of these CM are examined in two large vocabulary ASR tasks, namely the Switchboard task and a Mandarin dictation task, to verify the recognition errors in baseline recognition systems. Experimental results show: (1) the proposed MI-based CM greatly surpass another existing high-level CM which are based on the LSA technique; (2) among all low-level CM, word posteriori probabilities give the best verification performance; (3) when combining the word posteriori probabilities with the MI-based CM, the equal error rate is reduced from 24.4% to 23.9% in the Switchboard task and from 17.5% to 16.2% in the Mandarin dictation task.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114736318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Low complexity decomposition for the characteristic waveform of speech signal 语音信号特征波形的低复杂度分解
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409607
Guiping Wang, C. Bao
{"title":"Low complexity decomposition for the characteristic waveform of speech signal","authors":"Guiping Wang, C. Bao","doi":"10.1109/CHINSL.2004.1409607","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409607","url":null,"abstract":"For efficient coding of speech, it is desirable to separate the slowly and rapidly evolving spectral components to take advantage of their different perceptual qualities. Existing decomposition methods are too inflexible to model transient changes in the speech signals, require high delay or produce a large parameter set that is not scalable to low rates. We present a low complexity decomposition method, based on SVD, applied to waveform interpolation (WI) coding. This scheme reduces the computational complexity of the common SVD method in WI by exploiting the properties of human auditory perception to lower the dimensions of the decomposition matrix. This method requires only a single frame of speech and overcomes the substantial delay problems. The quantization solution involves the use of vector quantization on the separately decomposed singular matrices, U, V, and the diagonal matrix of singular values, S. The quality of the reconstructed speech can be varied according to the scalable decomposition and the bit rate available.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124812219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Spontaneous Mandarin production: results of a corpus-based study 自发普通话生产:基于语料库的研究结果
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409578
S. Tseng
{"title":"Spontaneous Mandarin production: results of a corpus-based study","authors":"S. Tseng","doi":"10.1109/CHINSL.2004.1409578","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409578","url":null,"abstract":"This paper presents empirical results of a corpus-based study attempting to characterize linguistic features of spontaneous Mandarin, which has been difficult to obtain before due to the lack of suitable speech material. Starting from linguistic considerations, these results of word frequency as well as syllable frequency should provide important cues to spontaneous speech production. Frequent words or syllables need special investigations into their phonetic forms in real production. Examinations of syllable structures also show that the distribution of onset consonant, nucleus and coda consonant in syllables which are often used in spontaneous Mandarin is similar across different speakers. And results of a segmental analysis also clearly indicate the likelihood of a segment being produced in spoken Mandarin.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124866374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Generalized posterior probability for minimizing verification errors at subword, word and sentence levels 广义后验概率用于最小化子词、词和句子级别的验证错误
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409574
W. Lo, F. Soong, Satoshi Nakamura
{"title":"Generalized posterior probability for minimizing verification errors at subword, word and sentence levels","authors":"W. Lo, F. Soong, Satoshi Nakamura","doi":"10.1109/CHINSL.2004.1409574","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409574","url":null,"abstract":"Generalized posterior probability, a statistical confidence measure, is tested in this study for verifying optimally the recognized units at the subword, word and sentence levels. We developed the generalized posterior probability by analyzing the exponential weights of the acoustic and language model scores to minimize the total verification errors at different unit levels. Experimental results have demonstrated the effectiveness of this generalized confidence measure for verifying Chinese LVCSR output. The Chinese Basic Travel Expression Corpus (BTEC) is used for evaluation and the relative improvement of confidence error rate (CER) over the baseline performance is 47.76% for sentences, 27.31% for words and 4.64% for subwords.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122011695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Modeling glottal effect on the spectral envelop of STRAIGHT using mixture of Gaussians 利用混合高斯谱模拟声门效应对直声波频谱包络的影响
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409589
Zhenhua Ling, Yu-Ping Wang, Yu Hu, Ren-Hua Wang
{"title":"Modeling glottal effect on the spectral envelop of STRAIGHT using mixture of Gaussians","authors":"Zhenhua Ling, Yu-Ping Wang, Yu Hu, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409589","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409589","url":null,"abstract":"This paper presents a method to model the influence of glottal excitation on the STRAIGHT (speech transformation and representation using adaptive interpolation of weighted spectrum) spectrum by fitting the spectral envelop with a mixture of Gaussians (MOG). The first Gaussian component is used as the estimation for the glottal formant in the STRAIGHT spectrum because analysis results show that it has an obviously stronger correlation with fundamental frequency than other spectral components and has similar characteristics to the glottal formant. Then linear regression is carried out to measure the relationship between F/sub 0/ and the parameters of the first Gaussian component. This model is applied to the STRAIGHT synthesis process and proved to be effective in compensating the voice quality variation caused by pitch modification.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122226197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Spoken language processing: people versus machines 口语处理:人与机器
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409567
William S-Y. Wang
{"title":"Spoken language processing: people versus machines","authors":"William S-Y. Wang","doi":"10.1109/CHINSL.2004.1409567","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409567","url":null,"abstract":"Summary form only given. A fundamental challenge we must meet for computers to eventually process spoken language as effectively as humans to capture the immensely rich fund of information we have in our heads that is not in the speech signal. This information is what gives us the ability to supply acoustic cues when these are degraded or missing, or to zero in on one speaker amid a chorus of other voices. While the powerful statistical methods currently used in speech recognition and synthesis have brought some success and useful applications, future progress will depend crucially on a deeper knowledge and greater use of this information. Some of this information is applicable to all languages, and some of it is specific to individual language types. In this discussion, special attention is given to the processing of spoken Chinese.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123648353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust features for speech recognition using minimum variance distortionless response (MVDR) spectrum estimation and feature normalization techniques 使用最小方差无失真响应(MVDR)频谱估计和特征归一化技术的语音识别鲁棒特征
2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI: 10.1109/CHINSL.2004.1409596
Yi Chen, Lin-Shan Lee
{"title":"Robust features for speech recognition using minimum variance distortionless response (MVDR) spectrum estimation and feature normalization techniques","authors":"Yi Chen, Lin-Shan Lee","doi":"10.1109/CHINSL.2004.1409596","DOIUrl":"https://doi.org/10.1109/CHINSL.2004.1409596","url":null,"abstract":"In this paper, feature extraction methods based on frequency-warped minimum variance distortionless response (MVDR) spectrum estimation are analyzed and tested. The effectiveness of the conventional FFT-based mel-frequency cepstrum coefficients (MFCC) and the MVDR-based features are carefully compared. Two normalization techniques are further applied to improve the robustness of the features: the widely used cepstral normalization (CN), and newly proposed progressive histogram equalization (PHEQ). Extensive experiments with respect to the AURORA2 database were performed. The results indicated that both the MVDR-based features and the normalization processes are very helpful.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126487542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信