2012 8th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation 噪声鲁棒低声语音识别使用一个无听杂音麦克风与VTS补偿
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-04 DOI: 10.1109/ISCSLP.2012.6423522
Chen-Yu Yang, Georgina Brown, Liang Lu, J. Yamagishi, Simon King
{"title":"Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation","authors":"Chen-Yu Yang, Georgina Brown, Liang Lu, J. Yamagishi, Simon King","doi":"10.1109/ISCSLP.2012.6423522","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423522","url":null,"abstract":"In this paper, we introduce a newly-created corpus of whispered speech simultaneously recorded via a close-talking microphone and a non-audible murmur (NAM) microphone in both clean and noisy conditions. To benchmark the corpus, which has been freely released recently, experiments on automatic recognition of continuous whispered speech were conducted. When training and test conditions are matched, the NAM microphone is found to be more robust against background noise than the close-talking microphone. In mismatched conditions (noisy data, models trained on clean speech), we found that Vector Taylor Series (VTS) compensation is particularly effective for the NAM signal.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131471966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese 基于音素平均f比的汉语说话人识别高效特征提取
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423485
Chen Zhao, Hongcui Wang, Songgun Hyon, Jianguo Wei, J. Dang
{"title":"Efficient feature extraction of speaker identification using phoneme mean F-ratio for Chinese","authors":"Chen Zhao, Hongcui Wang, Songgun Hyon, Jianguo Wei, J. Dang","doi":"10.1109/ISCSLP.2012.6423485","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423485","url":null,"abstract":"The features used for speaker recognition should have more speaker individual information while attenuating the linguistic information. In order to discard the linguistic information effectively, in this paper, we employed the phoneme mean F-ratio method to investigate the different contributions of different frequency region from the point of view of Chinese phoneme, and apply it for speaker identification. It is found that the speaker individual information depending on the phonemes is distributed in different frequency regions of speech sound. Based on the contribution rate, we extracted the new features and combined with GMM model. The experiment for speaker identification task is conducted with a King-ASR Chinese database. Compared with the MFCC feature, the identification error rate with the proposed feature was reduced by 32.94%. The results confirmed that the efficiency of the phoneme mean F-ratio method for improving speaker recognition performance for Chinese.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121059385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting 结合隐马尔可夫模型和人工神经网络的一种新的置信度方法,用于有效的关键字识别
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423455
S. Leow, T. S. Lau, Alvina Goh, Han Meng Peh, Teck Khim Ng, S. Siniscalchi, Chin-Hui Lee
{"title":"A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting","authors":"S. Leow, T. S. Lau, Alvina Goh, Han Meng Peh, Teck Khim Ng, S. Siniscalchi, Chin-Hui Lee","doi":"10.1109/ISCSLP.2012.6423455","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423455","url":null,"abstract":"In this paper, we present an acoustic keyword spotter that operates in two stages, detection and verification. In the detection stage, keywords are detected in the utterances, and in the verification stage, confidence measures are used to verify the detected keywords and reject false alarms. A new confidence measure, based on phoneme models trained on an Artificial Neural Network, is used in the verification stage to reduce false alarms. We have found that this ANN-based confidence, together with existing HMM-based confidence measures, is very effective in rejecting false alarms. Experiments are performed on two Mandarin databases and our results show that the proposed method is able to significantly reduce the number of false alarms.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117291743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Perceptual similarity between audio clips and feature selection for its measurement 音频片段之间的感知相似性及其度量的特征选择
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423476
Qinghua Wu, Xiao-Lei Zhang, Ping Lv, Ji Wu
{"title":"Perceptual similarity between audio clips and feature selection for its measurement","authors":"Qinghua Wu, Xiao-Lei Zhang, Ping Lv, Ji Wu","doi":"10.1109/ISCSLP.2012.6423476","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423476","url":null,"abstract":"In this paper, we explore the retrieval of perceptually similar audio. It focuses on finding sounds according to human perceptions. Thus such retrieval is more “human-centered” [1] than previous audio retrievals which intend to find homologous sounds. We make comprehensive use of various acoustic features to measure the perceptual similarity. Since some acoustic features may be redundant or even adverse to the similarity measurement, we propose to find a complementary and effective combination of acoustic features via SFFS (Sequential Floating Forward Selection) method. Experimental results show that LSP, MFCC, and PLP are the three most effective acoustic features. Moreover, the optimal combination of features can improve the accuracy of similarity classification by about 2% compared with the best performance of a single acoustic feature.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128652647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS 基于hmm的TTS改进模型聚类的交叉验证和最小生成误差
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423459
Fenglong Xie, Yi-Jian Wu, F. Soong
{"title":"Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS","authors":"Fenglong Xie, Yi-Jian Wu, F. Soong","doi":"10.1109/ISCSLP.2012.6423459","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423459","url":null,"abstract":"In HMM-based speech synthesis, context-dependent hidden Markov model (HMM) is widely used for its capability to synthesize highly intelligible and fairly smooth speech. However, to train HMMs of all possible contexts well is difficult, or even impossible, due to the intrinsic, insufficient training data coverage problem. As a result, thus trained models may over fit and their capability in predicting any unseen context in test is highly restricted. Recently cross-validation (CV) has been explored and applied to the decision tree-based clustering with the Maximum-Likelihood (ML) criterion and showed improved robustness in TTS synthesis. In this paper we generalize CV to decision tree clustering but with a different, Minimum Generation Error (MGE), criterion. Experimental results show that the generalization to MGE results in better TTS synthesis performance than that of the baseline systems.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129084301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TDOA information based vad for robust speech recognition in directional and diffuse noise field 基于TDOA信息的vad在方向性和漫漫性噪声场下的鲁棒语音识别
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423514
Kuan-Lang Huang, T. Chi
{"title":"TDOA information based vad for robust speech recognition in directional and diffuse noise field","authors":"Kuan-Lang Huang, T. Chi","doi":"10.1109/ISCSLP.2012.6423514","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423514","url":null,"abstract":"A two-microphone algorithm is proposed to improve automatic speech recognition (ASR) rates when target speech is corrupted by directional interferences and diffuse noise simultaneously. The algorithm adopts the time difference of arrival (TDOA) to suppress directional interferences and a TDOA-information based voice activity detector (VAD) to suppress diffuse noise. Simulation results show the proposed algorithm is effective in improving ASR rates in a sound field mixed with a directional interference and diffuse noise. Compared with the phase difference (PD) algorithm, the proposed method gives comparable recognition rates when facing a directional interference and much higher and more robust recognition rates when diffuse noise emerges.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127040670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Power-normalized PLP (PNPLP) feature for robust speech recognition 功率归一化PLP (PNPLP)特征用于鲁棒语音识别
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423529
Lichun Fan, Dengfeng Ke, Xiaoyin Fu, Shixiang Lu, Bo Xu
{"title":"Power-normalized PLP (PNPLP) feature for robust speech recognition","authors":"Lichun Fan, Dengfeng Ke, Xiaoyin Fu, Shixiang Lu, Bo Xu","doi":"10.1109/ISCSLP.2012.6423529","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423529","url":null,"abstract":"In this paper, we first review several approaches of feature extraction algorithms in robust speech recognition, e.g. Mel frequency cepstral coefficients (MFCC) [1], perceptual linear prediction (PLP) [2] and power-normalized cepstral coefficients (PNCC) [3]. A new feature extraction algorithm for noise robust speech recognition is proposed, in which medium-time processing works as noise suppression module. The details will be described to show that the algorithm is superior. The experimental results prove that our proposed method significantly outperforms state-of-the-art algorithms.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133125397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Break index labeling of mandarin text via syntactic-to-prosodic tree mapping 基于句法-韵律树映射的汉语断续索引标注
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423468
Xiaotian Zhang, Yao Qian, Hai Zhao, F. Soong
{"title":"Break index labeling of mandarin text via syntactic-to-prosodic tree mapping","authors":"Xiaotian Zhang, Yao Qian, Hai Zhao, F. Soong","doi":"10.1109/ISCSLP.2012.6423468","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423468","url":null,"abstract":"In this study, we investigate the break index labeling problem with a syntactic-to-prosodic structure conversion. The statistical relationship between the mapped syntactic tree structure and prosodic tree structure of sentences in the training set is used to generate a Synchronous Tree Substitution Grammar (STSG) which can describe the probabilistic mapping (substitution) rules between them. For a given test sentence and the corresponding parsed syntactic tree structure, thus generated STSG can convert the syntactic tree to a prosodic tree statistically. We compare the labeling results with other approaches and show the probabilistic mapping can indeed benefit break index labeling performance.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117354779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Discriminant local information distance preserving projection for text-independent speaker recognition 独立文本说话人识别的判别局部信息距离保持投影
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423466
Liang He, Jia Li
{"title":"Discriminant local information distance preserving projection for text-independent speaker recognition","authors":"Liang He, Jia Li","doi":"10.1109/ISCSLP.2012.6423466","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423466","url":null,"abstract":"A novel method is presented based on a statistical manifold for text-independent speaker recognition. After feature extraction, speaker recognition becomes a sequence classification problem. By discarding time information, the core task is the comparison of multiple sample sets. Each set is assumed to be governed by a probability density function (PDF). We estimate the PDFs and place the estimated statistical models on a statistical manifold. Fisher information distance is applied to compute distance between adjacent PDFs. Discriminant local preserving projection is used to push adjacent PDFs which belong to different classes apart to further improve the recognition accuracy. Experiments were carried out on the NIST SRE08 tel-tel database. Our presented method gave an excellent performance.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"482 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121160930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A comparative study of fMPE and RDLT approaches to LVCSR fMPE与RDLT方法在LVCSR中的比较研究
2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI: 10.1109/ISCSLP.2012.6423511
Jian Xu, Zhijie Yan, Qiang Huo
{"title":"A comparative study of fMPE and RDLT approaches to LVCSR","authors":"Jian Xu, Zhijie Yan, Qiang Huo","doi":"10.1109/ISCSLP.2012.6423511","DOIUrl":"https://doi.org/10.1109/ISCSLP.2012.6423511","url":null,"abstract":"This paper presents a comparative study of two discriminatively trained feature transform approaches, namely feature-space minimum phone error (fMPE) and region-dependent linear transform (RDLT), to large vocabulary continuous speech recognition (LVCSR). Experiments are performed on an LVCSR task of conversational telephone speech transcription using about 2,000 hours training data. Starting from a maximum likelihood (ML) trained GMM-HMM based baseline system, recognition accuracy and run-time efficiency of different variants of the above two methods are evaluated, and a specific RDLT approach is identified and recommended for deployment in LVCSR applications.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"38 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114033087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信