2008 6th International Symposium on Chinese Spoken Language Processing最新文献

筛选
英文 中文
Evaluation of a Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Modelon Aurora2, Aurora3, and Aurora4 Tasks 在Aurora2, Aurora3和Aurora4任务中使用高阶向量泰勒级数逼近显式失真模型的特征补偿方法的评估
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.32
Jun Du, Qiang Huo, Yu Hu
{"title":"Evaluation of a Feature Compensation Approach Using High-Order Vector Taylor Series Approximation of an Explicit Distortion Modelon Aurora2, Aurora3, and Aurora4 Tasks","authors":"Jun Du, Qiang Huo, Yu Hu","doi":"10.1109/CHINSL.2008.ECP.32","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.32","url":null,"abstract":"In our previous work, a new feature compensation approach to robust speech recognition was proposed by using high-order vector Taylor series (HOVTS) approximation of an explicit model of distortions caused by additive noises, and evaluation results were reported on Aurora2 database. This paper extends the above approach to deal with both additive noises and convolutional distortions, and reports evaluation results on Aurora2, Aurora3, and Aurora4 tasks.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124396528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A New Prosodic Strength Calculation Method for Prosody Reduction Modeling 韵律缩减建模中一种新的韵律强度计算方法
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.25
Honglei Cong, Zhiyong Wu, Lianhong Cai, H. Meng
{"title":"A New Prosodic Strength Calculation Method for Prosody Reduction Modeling","authors":"Honglei Cong, Zhiyong Wu, Lianhong Cai, H. Meng","doi":"10.1109/CHINSL.2008.ECP.25","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.25","url":null,"abstract":"To improve the naturalness of synthetic speech, prosody models in text-to-speech (TTS) system should be able to describe different prosody variations in natural speech. In this paper, prosody variation patterns behind the partial reduction phenomena are analyzed. In order to model the prosody reduction effect and incorporate it into the prosody model for speech synthesis, prosodic strength is introduced and a new prosodic strength calculation method is proposed. The method aims to model the sentence planning of prosody reduction and is based on the concept that the objective of prosodic strength should complete the planned target of the speech unit. The approach on how to integrate prosodic strength into speech synthesis system is also introduced. Experiments show that the estimated prosodic strength values by the proposed method have good correlations with both prosody structure and acoustic features.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117340303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis 基于hmm的混合语言(中英)语音合成
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.15
Yao Qian, Houwei Cao, F. Soong
{"title":"HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis","authors":"Yao Qian, Houwei Cao, F. Soong","doi":"10.1109/CHINSL.2008.ECP.15","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.15","url":null,"abstract":"English words or short phrases embedded in Mandarin utterances have become more common among bilingually educated people like college students in China. Similarly, it becomes highly desirable that TTS systems can synthesize mixed- language speech properly. Recently, we proposed an HMM-based bilingual TTS to synthesize a target language when only monolingual source language recording from a speaker is available. In this paper, we extend it to synthesize mixed- language sentences. A cross-language state mapping is first established between decision trees built from the English and Mandarin recordings of a bilingual speaker. Via the mapping, English words or phrases embedded in Mandarin sentences can then be synthesized. The bilingual state-mapping is extended to monolingual speaker to perform mixed-language synthesis. Perceptual test results show: (1) decent intelligibility, confirmed by an English word transcription accuracy of 86%; (2) good speech quality with an average MOS score of 3.2.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116530268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Entropy-Based Analysis of the Prosodic Features of Chinese Dialects 基于熵的汉语方言韵律特征分析
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.28
Raymond W. M. Ng, Tan Lee
{"title":"Entropy-Based Analysis of the Prosodic Features of Chinese Dialects","authors":"Raymond W. M. Ng, Tan Lee","doi":"10.1109/CHINSL.2008.ECP.28","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.28","url":null,"abstract":"In this paper, a novel approach is proposed to analyze prosodic features of four Chinese dialects: Wu, Cantonese, Min and Mandarin. The ultimate goal is to exploit these features in the task of automatic spoken language identification. Two entropy-based evaluation metrics are formulated to address the problems of data sparseness and lack of speakers. Different prosody-related acoustic features and their combinations are evaluated. FO, FO gradient and intensity are found to contain the most language-related information. Maximum language-related information are observed in multi-dimensional N-gram features with FO, FO gradient and syllable position in sentence. There are also some uncertain results that reveal the limitations of the proposed metrics.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122651694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion 基于贝叶斯信息准则的非目标区域信息挖掘置信度度量
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.69
Cong Liu, Yu Hu, Xiong-Guo Lei, Zhiguo Wang, Lirong Dai, Ren-Hua Wang
{"title":"Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion","authors":"Cong Liu, Yu Hu, Xiong-Guo Lei, Zhiguo Wang, Lirong Dai, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.69","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.69","url":null,"abstract":"In this paper appropriate confidence measures (CMs) are investigated for Mandarin command word recognition, both in the so-called target region and non-target region, respectively. Here the target region refers to the recognized speech part of command word while the non-target region refers to the recognized silence part. It shows that exploiting extra information in the non-target region can effectively complement the traditional CM which usually focus on the target region. Furthermore, when analyzing the non-target region in a more theoretical way, where Bayesian information criterion (BIC) is employed to locate more precise boundary in the non-target region, even more improvement is achieved. In two different Mandarin telephone command word tasks, more than 20% relative reduction of equal error rate (EER) is obtained.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126877412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Maximum Entropy Based Hierarchical Model for Automatic Prosodic Boundary Labeling in Mandarin 基于最大熵的汉语韵律边界自动标注层次模型
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.76
Fangzhou Liu, Huibin Jia, J. Tao
{"title":"A Maximum Entropy Based Hierarchical Model for Automatic Prosodic Boundary Labeling in Mandarin","authors":"Fangzhou Liu, Huibin Jia, J. Tao","doi":"10.1109/CHINSL.2008.ECP.76","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.76","url":null,"abstract":"Modeling prosodic rhythm is of great importance for both speech synthesis and speech understanding, and it requires a large enough corpus with precise prosodic boundary labels. This paper proposes a maximum entropy (ME) based hierarchical model, which utilizes both text and acoustic features, to automatically label Mandarin prosodic boundaries. Results of comparative experiments show that, for the task of prosodic boundary detection, ME model obviously outperforms classification and regression tree (CART), and the bottom-up hierarchical framework is also significantly superior to the flat single-level framework.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120930367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Multi-Layer F0 Modeling for HMM-Based Speech Synthesis 基于hmm的语音合成的多层F0建模
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.44
Cheng-Cheng Wang, Zhenhua Ling, Bu-Fan Zhang, Lirong Dai
{"title":"Multi-Layer F0 Modeling for HMM-Based Speech Synthesis","authors":"Cheng-Cheng Wang, Zhenhua Ling, Bu-Fan Zhang, Lirong Dai","doi":"10.1109/CHINSL.2008.ECP.44","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.44","url":null,"abstract":"This paper proposes a two-layer fundamental frequency (FO) modeling method for HMM-based parametric speech synthesis. The FO models are trained for each context- dependent phoneme in the conventional HMM-based speech synthesis system. Considering the super-segmental characteristics of FO features, an explicit syllable-layer FO model is introduced in this paper. At synthesis stage, the FO contour is generated by maximizing the combined likelihood functions of the phone-layer and syllable-layer FO models. The objective and subjective evaluation results in our experiments show that the proposed multi-layer FO modeling method can improve the performance of FO prediction for emotional speech synthesis.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114205274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Tone Evaluation of Chinese Continuous Speech Based on Prosodic Words 基于韵律词的汉语连续语音声调评价
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.77
Yi-Qian Pan, Si Wei, Ren-Hua Wang
{"title":"Tone Evaluation of Chinese Continuous Speech Based on Prosodic Words","authors":"Yi-Qian Pan, Si Wei, Ren-Hua Wang","doi":"10.1109/CHINSL.2008.ECP.77","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.77","url":null,"abstract":"Tonal evaluation of Chinese continuous speech plays an important role in Mandarin Chinese pronunciation test. In this paper, we introduce the Multi- Space Distribution Hidden Markov Model based on prosodic word. The results show that the performance of tonal syllable error rate can be reduced. For the non-standard Chinese Mandarin speech, the correlation between computer score and expert score was improved above 3.0% absolutely, compared with the baseline system without tonal pronunciation test.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116184341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analysis and Modeling of Affective Audio Visual Speech Based on PAD Emotion Space 基于PAD情感空间的情感视听语音分析与建模
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.82
Shen Zhang, Yingjin Xu, Jia Jia, Lianhong Cai
{"title":"Analysis and Modeling of Affective Audio Visual Speech Based on PAD Emotion Space","authors":"Shen Zhang, Yingjin Xu, Jia Jia, Lianhong Cai","doi":"10.1109/CHINSL.2008.ECP.82","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.82","url":null,"abstract":"This paper analyzes acoustic and visual features for affective audio-visual speech based on PAD (Pleasure-Arousal- Dominance) emotion space. The selected acoustic features include FO maximum, FO minimum, duration and energy. A set of Partial Expression Parameters (PEP) is proposed as visual features to describe affective facial movement on talking face. This paper explores the connection between PAD emotion space and acoustic/visual features respectively. The variation of acoustic features is predicted by PAD values, and a PAD-PEP mapping function for facial expression synthesis is built. Experimental result shows that PAD could be properly applied in describing emotional state as well as predicting the acoustic/visual features for affective audiovisual speech synthesis.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"340 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115884653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Automatic Assessment of Language Proficiency through Shadowing 通过影子语言能力自动评估
2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-30 DOI: 10.1109/CHINSL.2008.ECP.22
Dean Luo, N. Minematsu, Yutaka Yamauchi, K. Hirose
{"title":"Automatic Assessment of Language Proficiency through Shadowing","authors":"Dean Luo, N. Minematsu, Yutaka Yamauchi, K. Hirose","doi":"10.1109/CHINSL.2008.ECP.22","DOIUrl":"https://doi.org/10.1109/CHINSL.2008.ECP.22","url":null,"abstract":"Shadowing is a practice that requires learners to shadow a presented native utterance as closely and quickly as possible. Learners' pronunciation in shadowing, especially in the case of beginners, often becomes inarticulate and corrupt. These features of shadowing make it very difficult to assess shadowing productions. In this paper, we investigate the automatic pronunciation scoring methods for shadowing. Three automatic scores have be proposed and compared with each other. Experiments show that good correlations are found between the automatic scores and human ratings or TOEIC overall proficiency scores.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131281591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信